[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814232#comment-15814232 ] Surendra Singh Lilhore commented on HDFS-11163: --- Thanks [~cnauroth] and [~szetszwo] for comments.. bq. Indeed, the additional RPC is not needed since HdfsLocatedFileStatus already has the resolved storage policy. We don't need to call getStoragePolicy again. Yes, HdfsLocatedFileStatus has resolved storage policy. I am calling {{getStoragePolicy}} because it will give default policy in case of {{BLOCK_STORAGE_POLICY_ID_UNSPECIFIED}}. {code} public BlockStoragePolicy getPolicy(byte id) { // id == 0 means policy not specified. return id == 0? getDefaultPolicy(): policies[id]; } {code} we can add one API to get default policy from namenode, so we can avoid {{getStoragePolicy}} RPC per file. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11150) [SPS]: Provide persistence when satisfying storage policy.
[ https://issues.apache.org/jira/browse/HDFS-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu updated HDFS-11150: -- Attachment: HDFS-11150-HDFS-10285.006.patch Uploaded v6 patch: 1. Rebase my patch 2. Delete the code change of {{StoragePolicySatisfier}} > [SPS]: Provide persistence when satisfying storage policy. > -- > > Key: HDFS-11150 > URL: https://issues.apache.org/jira/browse/HDFS-11150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu > Attachments: HDFS-11150-HDFS-10285.001.patch, > HDFS-11150-HDFS-10285.002.patch, HDFS-11150-HDFS-10285.003.patch, > HDFS-11150-HDFS-10285.004.patch, HDFS-11150-HDFS-10285.005.patch, > HDFS-11150-HDFS-10285.006.patch, editsStored, editsStored.xml > > > Provide persistence for SPS in case that Hadoop cluster crashes by accident. > Basically we need to change EditLog and FsImage here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814077#comment-15814077 ] Tsz Wo Nicholas Sze commented on HDFS-11163: > ... It will require an additional getStoragePolicy RPC per file with the > default storage policy, whereas previously there was no RPC for those files. > ... Indeed, the additional RPC is not needed since HdfsLocatedFileStatus already has the resolved storage policy. We don't need to call getStoragePolicy again. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814033#comment-15814033 ] Hadoop QA commented on HDFS-9391: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 54s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 294 unchanged - 1 fixed = 294 total (was 295) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 15s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_121. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}162m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_111 Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | JDK v1.7.0_121 Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.TestEncryptionZones | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:b59b8b7 | | JIRA Issue | HDFS-9391 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846485/HDFS-9391-branch-2.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 67c8a669255d 3.13.0-96-generic #143-Ubuntu SMP M
[jira] [Comment Edited] (HDFS-11072) Add ability to unset and change directory EC policy
[ https://issues.apache.org/jira/browse/HDFS-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814020#comment-15814020 ] SammiChen edited comment on HDFS-11072 at 1/10/17 6:08 AM: --- Thanks [~andrew.wang] ! I agree with the general comment. Will upload a new patch to address point 1&2. was (Author: sammi): Thanks [~andrew.wang] ! I agree with the general comment. Will upload a new update the address point 1&2. > Add ability to unset and change directory EC policy > --- > > Key: HDFS-11072 > URL: https://issues.apache.org/jira/browse/HDFS-11072 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: SammiChen > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11072-v1.patch, HDFS-11072-v2.patch, > HDFS-11072-v3.patch, HDFS-11072-v4.patch, HDFS-11072-v5.patch, > HDFS-11072-v6.patch, HDFS-11072-v7.patch > > > Since the directory-level EC policy simply applies to files at create time, > it makes sense to make it more similar to storage policies and allow changing > and unsetting the policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11072) Add ability to unset and change directory EC policy
[ https://issues.apache.org/jira/browse/HDFS-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814020#comment-15814020 ] SammiChen commented on HDFS-11072: -- Thanks [~andrew.wang] ! I agree with the general comment. Will upload a new update the address point 1&2. > Add ability to unset and change directory EC policy > --- > > Key: HDFS-11072 > URL: https://issues.apache.org/jira/browse/HDFS-11072 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: SammiChen > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11072-v1.patch, HDFS-11072-v2.patch, > HDFS-11072-v3.patch, HDFS-11072-v4.patch, HDFS-11072-v5.patch, > HDFS-11072-v6.patch, HDFS-11072-v7.patch > > > Since the directory-level EC policy simply applies to files at create time, > it makes sense to make it more similar to storage policies and allow changing > and unsetting the policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11072) Add ability to unset and change directory EC policy
[ https://issues.apache.org/jira/browse/HDFS-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11072: - Attachment: HDFS-11072-v7.patch > Add ability to unset and change directory EC policy > --- > > Key: HDFS-11072 > URL: https://issues.apache.org/jira/browse/HDFS-11072 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: SammiChen > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11072-v1.patch, HDFS-11072-v2.patch, > HDFS-11072-v3.patch, HDFS-11072-v4.patch, HDFS-11072-v5.patch, > HDFS-11072-v6.patch, HDFS-11072-v7.patch > > > Since the directory-level EC policy simply applies to files at create time, > it makes sense to make it more similar to storage policies and allow changing > and unsetting the policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11306) Print remaining edit logs from buffer if edit log can't be rolled.
[ https://issues.apache.org/jira/browse/HDFS-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813865#comment-15813865 ] Yongjun Zhang commented on HDFS-11306: -- Hi [~jojochuang], Thanks much for working on this issue! Some comments of the patch: # Suggest to print a summary WARN message at the beginning of {{dumpRemainingEditLogs()}}, stating something like "The edits buffer should have been flushed but there are still unflushed. Below are the list of the unflushed transactions:". # add a finally block and call {code} IOUtils.cleanup(LOG, dis); IOUtils.cleanup(LOG, bis); {code} # Can we add couple of more different edits in the test? Thanks. > Print remaining edit logs from buffer if edit log can't be rolled. > -- > > Key: HDFS-11306 > URL: https://issues.apache.org/jira/browse/HDFS-11306 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11306.001.patch > > > In HDFS-10943 [~yzhangal] reported that edit log can not be rolled due to > unexpected edit logs lingering in the buffer. > Unable to root cause the bug, I propose that we dump the remaining edit logs > in the buffer into namenode log, before crashing namenode. Use this new > capability to find the ops that sneaks into the buffer unexpectedly, and > hopefully catch the bug. > This effort is orthogonal, but related to HDFS-11292, which adds additional > informational logs to help debug this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9935) Remove LEASE_{SOFTLIMIT,HARDLIMIT}_PERIOD and unused import from HdfsServerConstants
[ https://issues.apache.org/jira/browse/HDFS-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813846#comment-15813846 ] Yiqun Lin commented on HDFS-9935: - I have taken a quick check for this in branch-2 and branch-2.8, I found these two variables were also not used anymore. Is this means we can be safe to commit? > Remove LEASE_{SOFTLIMIT,HARDLIMIT}_PERIOD and unused import from > HdfsServerConstants > > > Key: HDFS-9935 > URL: https://issues.apache.org/jira/browse/HDFS-9935 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-9935.001.patch, HDFS-9935.002.patch > > > In HDFS-9134, it has moved the > {{LEASE_SOFTLIMIT_PERIOD}},{{LEASE_HARDLIMIT_PERIOD}} constants from > {{HdfsServerConstants}} to {{HdfsConstants}} because these two constants are > used by {{DFSClient}} which is moved to {{hadoop-hdfs-client}}. And constants > in {{HdfsConstants}} can be both used by client and server side. In addition, > I have checked that these two constants in {{HdfsServerConstants}} has > already not been used in project now and were all replaced by > {{HdfsConstants.LEASE_SOFTLIMIT_PERIOD}},{{HdfsConstants.LEASE_HARDLIMIT_PERIOD}}. > So I think we can remove these unused constant values in > {{HdfsServerConstants}} completely. Instead of we can use them in > {{HdfsConstants}} if we want to use them in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11150) [SPS]: Provide persistence when satisfying storage policy.
[ https://issues.apache.org/jira/browse/HDFS-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813791#comment-15813791 ] Yuanbo Liu commented on HDFS-11150: --- [~umamaheswararao] Sure. Thanks for your reminder. > [SPS]: Provide persistence when satisfying storage policy. > -- > > Key: HDFS-11150 > URL: https://issues.apache.org/jira/browse/HDFS-11150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu > Attachments: HDFS-11150-HDFS-10285.001.patch, > HDFS-11150-HDFS-10285.002.patch, HDFS-11150-HDFS-10285.003.patch, > HDFS-11150-HDFS-10285.004.patch, HDFS-11150-HDFS-10285.005.patch, > editsStored, editsStored.xml > > > Provide persistence for SPS in case that Hadoop cluster crashes by accident. > Basically we need to change EditLog and FsImage here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813708#comment-15813708 ] Manoj Govindassamy commented on HDFS-9391: -- Thanks [~mingma]. Attached branch-2.01.patch. Thanks for the review. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, > HDFS-9391-branch-2.01.patch, HDFS-9391.01.patch, HDFS-9391.02.patch, > HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-9391: - Attachment: HDFS-9391-branch-2.01.patch > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, > HDFS-9391-branch-2.01.patch, HDFS-9391.01.patch, HDFS-9391.02.patch, > HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class
[ https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813659#comment-15813659 ] Hudson commented on HDFS-11273: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11095 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11095/]) HDFS-11273. Move TransferFsImage#doGetUrl function to a Util class. (jing9: rev 7ec609b28989303fe0cc36812f225028b0251b32) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Util.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HttpPutFailedException.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ImageServlet.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HttpGetFailedException.java > Move TransferFsImage#doGetUrl function to a Util class > -- > > Key: HDFS-11273 > URL: https://issues.apache.org/jira/browse/HDFS-11273 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11273.000.patch, HDFS-11273.001.patch, > HDFS-11273.002.patch, HDFS-11273.003.patch, HDFS-11273.004.patch > > > TransferFsImage#doGetUrl downloads files from the specified url and stores > them in the specified storage location. HDFS-4025 plans to synchronize the > log segments in JournalNodes. If a log segment is missing from a JN, the JN > downloads it from another JN which has the required log segment. We need > TransferFsImage#doGetUrl and TransferFsImage#receiveFile to accomplish this. > So we propose to move the said functions to a Utility class so as to be able > to use it for JournalNode syncing as well, without duplication of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-6874) Add GETFILEBLOCKLOCATIONS operation to HttpFS
[ https://issues.apache.org/jira/browse/HDFS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813646#comment-15813646 ] Weiwei Yang edited comment on HDFS-6874 at 1/10/17 2:52 AM: Chatted with [~clamb] offline, he mentioned he was not near with HDFS for sometime and suggest me find someone else to review. [~andrew.wang], would you help to review this? This is a continual effort to make GETFILEBLOCKLOCATIONS exposed consistently from fs/httpfs/webhdfs interfaces, this is an old JIRA opened for sometime but I would like to pick up and get it done. Thanks was (Author: cheersyang): Chatted with [~clamb] offline, he mentioned he was not near with HDFS for sometime and suggest me find someone else to review. [~andrew.wang], would you help to review this? This is a continual effort to make GETFILEBLOCKLOCATIONS exposed consistently from fs/httpfs/webhdfs interfaces, this is old JIRA opened for sometime but I would like to pick up and get it done. Thanks > Add GETFILEBLOCKLOCATIONS operation to HttpFS > - > > Key: HDFS-6874 > URL: https://issues.apache.org/jira/browse/HDFS-6874 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.1, 2.7.3 >Reporter: Gao Zhong Liang >Assignee: Weiwei Yang > Labels: BB2015-05-TBR > Attachments: HDFS-6874-1.patch, HDFS-6874-branch-2.6.0.patch, > HDFS-6874.02.patch, HDFS-6874.03.patch, HDFS-6874.04.patch, > HDFS-6874.05.patch, HDFS-6874.patch > > > GETFILEBLOCKLOCATIONS operation is missing in HttpFS, which is already > supported in WebHDFS. For the request of GETFILEBLOCKLOCATIONS in > org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far: > ... > case GETFILEBLOCKLOCATIONS: { > response = Response.status(Response.Status.BAD_REQUEST).build(); > break; > } > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-6874) Add GETFILEBLOCKLOCATIONS operation to HttpFS
[ https://issues.apache.org/jira/browse/HDFS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-6874: -- Fix Version/s: (was: 2.9.0) > Add GETFILEBLOCKLOCATIONS operation to HttpFS > - > > Key: HDFS-6874 > URL: https://issues.apache.org/jira/browse/HDFS-6874 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.1, 2.7.3 >Reporter: Gao Zhong Liang >Assignee: Weiwei Yang > Labels: BB2015-05-TBR > Attachments: HDFS-6874-1.patch, HDFS-6874-branch-2.6.0.patch, > HDFS-6874.02.patch, HDFS-6874.03.patch, HDFS-6874.04.patch, > HDFS-6874.05.patch, HDFS-6874.patch > > > GETFILEBLOCKLOCATIONS operation is missing in HttpFS, which is already > supported in WebHDFS. For the request of GETFILEBLOCKLOCATIONS in > org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far: > ... > case GETFILEBLOCKLOCATIONS: { > response = Response.status(Response.Status.BAD_REQUEST).build(); > break; > } > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6874) Add GETFILEBLOCKLOCATIONS operation to HttpFS
[ https://issues.apache.org/jira/browse/HDFS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813646#comment-15813646 ] Weiwei Yang commented on HDFS-6874: --- Chatted with [~clamb] offline, he mentioned he was not near with HDFS for sometime and suggest me find someone else to review. [~andrew.wang], would you help to review this? This is a continual effort to make GETFILEBLOCKLOCATIONS exposed consistently from fs/httpfs/webhdfs interfaces, this is old JIRA opened for sometime but I would like to pick up and get it done. Thanks > Add GETFILEBLOCKLOCATIONS operation to HttpFS > - > > Key: HDFS-6874 > URL: https://issues.apache.org/jira/browse/HDFS-6874 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.1, 2.7.3 >Reporter: Gao Zhong Liang >Assignee: Weiwei Yang > Labels: BB2015-05-TBR > Fix For: 2.9.0 > > Attachments: HDFS-6874-1.patch, HDFS-6874-branch-2.6.0.patch, > HDFS-6874.02.patch, HDFS-6874.03.patch, HDFS-6874.04.patch, > HDFS-6874.05.patch, HDFS-6874.patch > > > GETFILEBLOCKLOCATIONS operation is missing in HttpFS, which is already > supported in WebHDFS. For the request of GETFILEBLOCKLOCATIONS in > org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far: > ... > case GETFILEBLOCKLOCATIONS: { > response = Response.status(Response.Status.BAD_REQUEST).build(); > break; > } > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7343) HDFS smart storage management
[ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813615#comment-15813615 ] Anu Engineer commented on HDFS-7343: bq. Also, it makes SSM more stable as these data stored in NN. When SSM node failure happened, we can simply launch another instance on another node. I do see now where this thought came from. However I think that SSM should be able to stand independent and not rely on Namenode. Here are some reasons I can think of. 1. SSM can be implemented with no changes to NN, that makes for easier and faster development 2. No added complexity in Namenode 3. Moving State from SSM to Namenode just makes SSM simpler, but makes Namenode to that degree more complicated. So while I have a better understanding of the motivation, making NN store rules and metrics which are needed by SSM feels like a wrong choice. As I said earlier, if you want to run this in other scenarios then this dependency on NN makes it hard. For example, someone is running SSM in cloud and is using a cloud native file system instead of Namenode. bq. This brings in uncertainty to users. We can implement automatical rule-engine level throttle in Phase 2. if a rule is misbehaving then NN will become slow, thus it brings uncertainty. But I am fine with the choice of postponing this to a later stage. Would you be able to count how many times a particular rule was triggered in a given time window ? That would be useful to debug this issue. bq. Rule is the core part for SSM to function. For convenient and reliable consideration, it's better to store it in NN to keep SSM simple and stateless as suggested. Rules are core part of SSM.So let us store them in SSM instead of storing it in NN, or feel free to store it as a file on HDFS. Modifying Namenode to store config of some other service will make Namenode a dumping ground of config for all other services. bq. Yes, good question. We can support HA by many ways, for example, periodically checkpoint the data to HDFS or store the data in the same way as edit log. Sorry, I am not unable to understand this response clearly. Are you now saying we will support HA ? bq. First, we provide some verification mechanism when adding some rule. For example, we can give the user some warning when the candidate files of an action (such as move) exceeding some certain value. This is a classic time of check to time of use problem. When the rule gets written may be there is no issue, but as the file count increases this becomes a problem. bq. Second, the execution state and other info related info can also be showed in the dashboard or queried. It's convenient for users to track the status and take actions accordingly. It's also very good to implement a timeout mechanism. Agreed, but now have we not introduced the uncertainty issue back into the solution ? I thought we did not want to restrict the number of times a rule fires since that would introduce uncertainty. bq. HDFS client will bypass SSM when the query fails, then the client goes back to the original working flow. It has almost no effect on the existing I/O. So then the SSM rules are violated ? How does it deal with that issue ? since you have to deal with SSM being down why have the HDFS client even talk to SSM in an I/O path ? Why not just rely on background SSM logic and rely on the rules doing the right thing ? Thanks for sharing the graph, I appreciate it. > HDFS smart storage management > - > > Key: HDFS-7343 > URL: https://issues.apache.org/jira/browse/HDFS-7343 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kai Zheng >Assignee: Wei Zhou > Attachments: HDFS-Smart-Storage-Management-update.pdf, > HDFS-Smart-Storage-Management.pdf, move.jpg > > > As discussed in HDFS-7285, it would be better to have a comprehensive and > flexible storage policy engine considering file attributes, metadata, data > temperature, storage type, EC codec, available hardware capabilities, > user/application preference and etc. > Modified the title for re-purpose. > We'd extend this effort some bit and aim to work on a comprehensive solution > to provide smart storage management service in order for convenient, > intelligent and effective utilizing of erasure coding or replicas, HDFS cache > facility, HSM offering, and all kinds of tools (balancer, mover, disk > balancer and so on) in a large cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11308) NameNode doFence state judgment problem
[ https://issues.apache.org/jira/browse/HDFS-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813607#comment-15813607 ] Brahma Reddy Battula commented on HDFS-11308: - bq.nc is alias of ncat which does not have "-z" option in CentOS 7. Even Suse also same issue..Please have look at HDFS-3618. > NameNode doFence state judgment problem > --- > > Key: HDFS-11308 > URL: https://issues.apache.org/jira/browse/HDFS-11308 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.1 > Environment: CentOS Linux release 7.1.1503 (Core) >Reporter: tangshangwen > > In our Cluster, I found some abnormal in ZKFC log > {noformat} > [2017-01-10T01:42:37.168+08:00] [INFO] > hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health > Monitor for NameNode at > xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : > Indeterminate response from trying to kill service. Verifying whether it is > running using nc... > [2017-01-10T01:42:37.234+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' > [2017-01-10T01:42:37.235+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat > for more information, usage options and help. QUITTING. > {noformat} > When I perform nc an exception occurs, the return value is 2, and cannot > confirm sshfence success,this may lead to some problems > {code:title=SshFenceByTcpPort.java|borderStyle=solid} > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7967) Reduce the performance impact of the balancer
[ https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813598#comment-15813598 ] Hadoop QA commented on HDFS-7967: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 7s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 34s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 366 unchanged - 9 fixed = 371 total (was 375) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 55s{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_121. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}159m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_111 Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:b59b8b7 | | JIRA Issue | HDFS-7967 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846448/HDFS-7967.branch-2.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b8546401be71 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision
[jira] [Updated] (HDFS-11268) Persist erasure coding policy ID in FSImage
[ https://issues.apache.org/jira/browse/HDFS-11268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-11268: --- Priority: Critical (was: Major) Component/s: erasure-coding Issue Type: Bug (was: Improvement) Hi [~Sammi] have you made any progress on this issue? I'm changing this to Bug and Critical, since it seems important for EC users. > Persist erasure coding policy ID in FSImage > --- > > Key: HDFS-11268 > URL: https://issues.apache.org/jira/browse/HDFS-11268 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: SammiChen >Assignee: SammiChen >Priority: Critical > Labels: hdfs-ec-3.0-must-do > > Currently, FSImage only has the information about whether the file is striped > or not. It doesn't save the erasure coding policy ID. Later, when the FSImage > is loaded to create the name space, the default system ec policy is used to > as file's ec policy. In case if the ec policy on file is not the default ec > policy, then the content of the file cannot be accessed correctly in this > case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10759) Change fsimage bool isStriped from boolean to an enum
[ https://issues.apache.org/jira/browse/HDFS-10759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813585#comment-15813585 ] Andrew Wang commented on HDFS-10759: Thanks for the rev Ewan. Overall this looks really good, just nitty review comments. Since this is a big patch with a lot of mechanical changes, I'll try to review promptly to reduce rebase overhead. Some review comments: * It looks like checkstyle is unhappy, I think your IDE is set to wider than 80 chars line width, some other issues too. * It'd be good to have a test that the default value of the proto enum is CONTIGUOUS as expected, per Jing's concern. * INodeFile#BLOCK_TYPE_MASK_CONTIGUOUS is unused * BLOCK_ID_MASK_STRIPED could use a comment, I double taked initially when I saw blockType was being compared against a variable named MASK, and it was the same value as BLOCK_ID_MASK. * BlockType could use some more unit tests with filled in lower values. > Change fsimage bool isStriped from boolean to an enum > - > > Key: HDFS-10759 > URL: https://issues.apache.org/jira/browse/HDFS-10759 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1, 3.0.0-beta1, 3.0.0-alpha2 >Reporter: Ewan Higgs > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-10759.0001.patch, HDFS-10759.0002.patch, > HDFS-10759.0003.patch > > > The new erasure coding project has updated the protocol for fsimage such that > the {{INodeFile}} has a boolean '{{isStriped}}'. I think this is better as an > enum or integer since a boolean precludes any future block types. > For example: > {code} > enum BlockType { > CONTIGUOUS = 0, > STRIPED = 1, > } > {code} > We can also make this more robust to future changes where there are different > block types supported in a staged rollout. Here, we would use > {{UNKNOWN_BLOCK_TYPE}} as the first value since this is the default value. > See > [here|http://androiddevblog.com/protocol-buffers-pitfall-adding-enum-values/] > for more discussion. > {code} > enum BlockType { > UNKNOWN_BLOCK_TYPE = 0, > CONTIGUOUS = 1, > STRIPED = 2, > } > {code} > But I'm not convinced this is necessary since there are other enums that > don't use this approach. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class
[ https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-11273: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha2 Status: Resolved (was: Patch Available) The latest patch looks good to me. The failed tests should be unrelated and they passed in my local machine. +1 I've committed the patch to trunk. Thanks for the contribution, [~hkoneru]! > Move TransferFsImage#doGetUrl function to a Util class > -- > > Key: HDFS-11273 > URL: https://issues.apache.org/jira/browse/HDFS-11273 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11273.000.patch, HDFS-11273.001.patch, > HDFS-11273.002.patch, HDFS-11273.003.patch, HDFS-11273.004.patch > > > TransferFsImage#doGetUrl downloads files from the specified url and stores > them in the specified storage location. HDFS-4025 plans to synchronize the > log segments in JournalNodes. If a log segment is missing from a JN, the JN > downloads it from another JN which has the required log segment. We need > TransferFsImage#doGetUrl and TransferFsImage#receiveFile to accomplish this. > So we propose to move the said functions to a Utility class so as to be able > to use it for JournalNode syncing as well, without duplication of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11303) Hedged read might hang infinitely if read data from all DN failed
[ https://issues.apache.org/jira/browse/HDFS-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813527#comment-15813527 ] Chen Zhang commented on HDFS-11303: --- Stack, thanks for your comments. Yes, the test is just to verify, it hangs without the fix. > Hedged read might hang infinitely if read data from all DN failed > -- > > Key: HDFS-11303 > URL: https://issues.apache.org/jira/browse/HDFS-11303 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0-alpha1 >Reporter: Chen Zhang > Attachments: HDFS-11303-001.patch > > > Hedged read will read from a DN first, if timeout, then read other DNs > simultaneously. > If read all DN failed, this bug will cause the future-list not empty(the > first timeout request left in list), and hang in the loop infinitely -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7343) HDFS smart storage management
[ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813506#comment-15813506 ] Wei Zhou commented on HDFS-7343: {quote} though I'd encourage you to think about how far we can get with a stateless system (possibly by pushing more work into the NN and DN). {quote} >From this words, I myself thought it's better to store these data in NN to >approach the stateless system suggested by Andrew. I did not mean that Andrew >said it's better to do it in this way. Sorry for my poor English. > HDFS smart storage management > - > > Key: HDFS-7343 > URL: https://issues.apache.org/jira/browse/HDFS-7343 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kai Zheng >Assignee: Wei Zhou > Attachments: HDFS-Smart-Storage-Management-update.pdf, > HDFS-Smart-Storage-Management.pdf, move.jpg > > > As discussed in HDFS-7285, it would be better to have a comprehensive and > flexible storage policy engine considering file attributes, metadata, data > temperature, storage type, EC codec, available hardware capabilities, > user/application preference and etc. > Modified the title for re-purpose. > We'd extend this effort some bit and aim to work on a comprehensive solution > to provide smart storage management service in order for convenient, > intelligent and effective utilizing of erasure coding or replicas, HDFS cache > facility, HSM offering, and all kinds of tools (balancer, mover, disk > balancer and so on) in a large cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11194) Maintain aggregated peer performance metrics on NameNode
[ https://issues.apache.org/jira/browse/HDFS-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813493#comment-15813493 ] Xiaobing Zhou edited comment on HDFS-11194 at 1/10/17 1:38 AM: --- Thank you [~arpitagarwal] for the patch. I've some comments. # RollingAverages#getStats is using non-rolling mean states, it should use rolling ones instead. See RollingAverages#snapshot for calculation of rolling averages. # DFS_METRICS_ROLLING_AVERAGES_WINDOW_SIZE_DEFAULT should be changed to the same naming with DFS_METRICS_ROLLING_AVERAGE_WINDOW_LENGTH_KEY # These parameters can be changed to be configurable. SlowNodeDetector#minOutlierDetectionPeers DataNodePeerMetrics#LOW_THRESHOLD_MS SlowPeerTracker#MAX_NODES_TO_REPORT # In BlockReceiver#receivePacket, after trackSendPacketToLastNodeInPipeline(duration); It may need to change a bit, e.g. if (duration > DataNodePeerMetrics#LOW_THRESHOLD_MS). or remove the warning msg at all. was (Author: xiaobingo): Thank you [~arpitagarwal] for the patch. I've some comments. # RollingAverages#getStats is using non-rolling mean states, it should use rolling ones instead. See RollingAverages#snapshot for calculation of rolling averages. # DFS_METRICS_ROLLING_AVERAGES_WINDOW_SIZE_DEFAULT should be changed to the same naming with DFS_METRICS_ROLLING_AVERAGE_WINDOW_LENGTH_KEY # These parameters can be changed to be configurable. SlowNodeDetector#minOutlierDetectionPeers DataNodePeerMetrics#LOW_THRESHOLD_MS SlowPeerTracker#MAX_NODES_TO_REPORT # In BlockReceiver#receivePacket, after trackSendPacketToLastNodeInPipeline(duration); It may need to change a bit, e.g. if (duration > DataNodePeerMetrics#LOW_THRESHOLD_MS). or remove the warning msg at all. > Maintain aggregated peer performance metrics on NameNode > > > Key: HDFS-11194 > URL: https://issues.apache.org/jira/browse/HDFS-11194 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Xiaobing Zhou >Assignee: Arpit Agarwal > Attachments: HDFS-11194-03-04.delta, HDFS-11194.01.patch, > HDFS-11194.02.patch, HDFS-11194.03.patch, HDFS-11194.04.patch > > > The metrics collected in HDFS-10917 should be reported to and aggregated on > NameNode as part of heart beat messages. This will make is easy to expose it > through JMX to users who are interested in them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11194) Maintain aggregated peer performance metrics on NameNode
[ https://issues.apache.org/jira/browse/HDFS-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813493#comment-15813493 ] Xiaobing Zhou commented on HDFS-11194: -- Thank you [~arpitagarwal] for the patch. I've some comments. # RollingAverages#getStats is using non-rolling mean states, it should use rolling ones instead. See RollingAverages#snapshot for calculation of rolling averages. # DFS_METRICS_ROLLING_AVERAGES_WINDOW_SIZE_DEFAULT should be changed to the same naming with DFS_METRICS_ROLLING_AVERAGE_WINDOW_LENGTH_KEY # These parameters can be changed to be configurable. SlowNodeDetector#minOutlierDetectionPeers DataNodePeerMetrics#LOW_THRESHOLD_MS SlowPeerTracker#MAX_NODES_TO_REPORT # In BlockReceiver#receivePacket, after trackSendPacketToLastNodeInPipeline(duration); It may need to change a bit, e.g. if (duration > DataNodePeerMetrics#LOW_THRESHOLD_MS). or remove the warning msg at all. > Maintain aggregated peer performance metrics on NameNode > > > Key: HDFS-11194 > URL: https://issues.apache.org/jira/browse/HDFS-11194 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Xiaobing Zhou >Assignee: Arpit Agarwal > Attachments: HDFS-11194-03-04.delta, HDFS-11194.01.patch, > HDFS-11194.02.patch, HDFS-11194.03.patch, HDFS-11194.04.patch > > > The metrics collected in HDFS-10917 should be reported to and aggregated on > NameNode as part of heart beat messages. This will make is easy to expose it > through JMX to users who are interested in them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813481#comment-15813481 ] Ming Ma commented on HDFS-9391: --- +1. Manoj, given the patch doesn't apply directly to branch-2, can you please provide another patch? Thanks. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance > webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11072) Add ability to unset and change directory EC policy
[ https://issues.apache.org/jira/browse/HDFS-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813468#comment-15813468 ] Andrew Wang commented on HDFS-11072: I'm almost +1, thanks for the quick rev Sammi, * In the new shell command, we get the EC policy before unsetting to check if there is a policy already set. {{unsetStoragePolicy}} and the Java {{unsetECPolicy}} API don't seem to error in this case, so I think we should just call unset without checking. This also requires a fix in the user docs. As a general comment, I prefer to surface errors on the NN rather than in the client code, for consistency between the Java and shell APIs. * Nit: "unexist" -> "non-existent" in the comments in testNonExistentDir > Add ability to unset and change directory EC policy > --- > > Key: HDFS-11072 > URL: https://issues.apache.org/jira/browse/HDFS-11072 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: SammiChen > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11072-v1.patch, HDFS-11072-v2.patch, > HDFS-11072-v3.patch, HDFS-11072-v4.patch, HDFS-11072-v5.patch, > HDFS-11072-v6.patch > > > Since the directory-level EC policy simply applies to files at create time, > it makes sense to make it more similar to storage policies and allow changing > and unsetting the policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11305) libhdfs++: Log Datanode information when reading an HDFS block
[ https://issues.apache.org/jira/browse/HDFS-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813396#comment-15813396 ] Hadoop QA commented on HDFS-11305: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 33s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 55s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 26s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 32s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 49s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 31s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 3s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.7.0_121. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:78fc6b6 | | JIRA Issue | HDFS-11305 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846443/HDFS-11305.HDFS-8707.001.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux ec886e0f1381 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / 2ceec2b | | Default Java | 1.7.0_121 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 | | JDK v1.7.0_121 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18120/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18120/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > libhdfs++: Log Datanode information when reading an HDFS block > -- > > Key: HDFS-11305 > URL: https://issues.apache.org/jira/browse/HDFS-11305 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu >Assignee: Xiaowei Zhu >
[jira] [Created] (HDFS-11309) chooseTargetTypeInSameNode should pass accurate block size to chooseStorage4Block while choosing target
Uma Maheswara Rao G created HDFS-11309: -- Summary: chooseTargetTypeInSameNode should pass accurate block size to chooseStorage4Block while choosing target Key: HDFS-11309 URL: https://issues.apache.org/jira/browse/HDFS-11309 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-10285 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently chooseTargetTypeInSameNode is not passing accurate block size to chooseStorage4Block while choosing local target. Instead of accurate size we are passing 0, which assumes to ignore space constraint in the storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11292) log lastWrittenTxId etc info in logSyncAll
[ https://issues.apache.org/jira/browse/HDFS-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-11292: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha2 2.8.0 Status: Resolved (was: Patch Available) > log lastWrittenTxId etc info in logSyncAll > -- > > Key: HDFS-11292 > URL: https://issues.apache.org/jira/browse/HDFS-11292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-11292.001.patch, HDFS-11292.002.patch, > HDFS-11292.003.patch > > > For the issue reported in HDFS-10943, even after HDFS-7964's fix is included, > the problem still exists, this means there might be some synchronization > issue. > To diagnose that, create this jira to report the lastWrittenTxId info in > {{logSyncAll()}} call, such that we can compare against the error message > reported in HDFS-7964 > Specifically, there is two possibility for the HDFS-10943 issue: > 1. {{logSyncAll()}} (statement A in the code quoted below) doesn't flush all > requested txs for some reason > 2. {{logSyncAll()}} does flush all requested txs, but some new txs sneaked > in between A and B. It's observed that the lastWrittenTxId in B and C are the > same. > This proposed reporting would help confirming if 2 is true. > {code} > public synchronized void endCurrentLogSegment(boolean writeEndTxn) { > LOG.info("Ending log segment " + curSegmentTxId); > Preconditions.checkState(isSegmentOpen(), > "Bad state: %s", state); > if (writeEndTxn) { > logEdit(LogSegmentOp.getInstance(cache.get(), > FSEditLogOpCodes.OP_END_LOG_SEGMENT)); > } > // always sync to ensure all edits are flushed. > A.logSyncAll(); > B.printStatistics(true); > final long lastTxId = getLastWrittenTxId(); > try { > C. journalSet.finalizeLogSegment(curSegmentTxId, lastTxId); > editLogStream = null; > } catch (IOException e) { > //All journals have failed, it will be handled in logSync. > } > state = State.BETWEEN_LOG_SEGMENTS; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11292) log lastWrittenTxId etc info in logSyncAll
[ https://issues.apache.org/jira/browse/HDFS-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813354#comment-15813354 ] Yongjun Zhang commented on HDFS-11292: -- Thanks [~jojochuang] much for the review! I committed to trunk, branch-2 and branch-2.8. > log lastWrittenTxId etc info in logSyncAll > -- > > Key: HDFS-11292 > URL: https://issues.apache.org/jira/browse/HDFS-11292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-11292.001.patch, HDFS-11292.002.patch, > HDFS-11292.003.patch > > > For the issue reported in HDFS-10943, even after HDFS-7964's fix is included, > the problem still exists, this means there might be some synchronization > issue. > To diagnose that, create this jira to report the lastWrittenTxId info in > {{logSyncAll()}} call, such that we can compare against the error message > reported in HDFS-7964 > Specifically, there is two possibility for the HDFS-10943 issue: > 1. {{logSyncAll()}} (statement A in the code quoted below) doesn't flush all > requested txs for some reason > 2. {{logSyncAll()}} does flush all requested txs, but some new txs sneaked > in between A and B. It's observed that the lastWrittenTxId in B and C are the > same. > This proposed reporting would help confirming if 2 is true. > {code} > public synchronized void endCurrentLogSegment(boolean writeEndTxn) { > LOG.info("Ending log segment " + curSegmentTxId); > Preconditions.checkState(isSegmentOpen(), > "Bad state: %s", state); > if (writeEndTxn) { > logEdit(LogSegmentOp.getInstance(cache.get(), > FSEditLogOpCodes.OP_END_LOG_SEGMENT)); > } > // always sync to ensure all edits are flushed. > A.logSyncAll(); > B.printStatistics(true); > final long lastTxId = getLastWrittenTxId(); > try { > C. journalSet.finalizeLogSegment(curSegmentTxId, lastTxId); > editLogStream = null; > } catch (IOException e) { > //All journals have failed, it will be handled in logSync. > } > state = State.BETWEEN_LOG_SEGMENTS; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813330#comment-15813330 ] Manoj Govindassamy commented on HDFS-9391: -- Test failures are not related to the patch. They are passing locally for me. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance > webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11292) log lastWrittenTxId etc info in logSyncAll
[ https://issues.apache.org/jira/browse/HDFS-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813317#comment-15813317 ] Hudson commented on HDFS-11292: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11093 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11093/]) HDFS-11292. log lastWrittenTxId etc info in logSyncAll. Contributed by (yzhang: rev 603cbcd513a74c29e0e4ec9dc181ff08887d64a4) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java > log lastWrittenTxId etc info in logSyncAll > -- > > Key: HDFS-11292 > URL: https://issues.apache.org/jira/browse/HDFS-11292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-11292.001.patch, HDFS-11292.002.patch, > HDFS-11292.003.patch > > > For the issue reported in HDFS-10943, even after HDFS-7964's fix is included, > the problem still exists, this means there might be some synchronization > issue. > To diagnose that, create this jira to report the lastWrittenTxId info in > {{logSyncAll()}} call, such that we can compare against the error message > reported in HDFS-7964 > Specifically, there is two possibility for the HDFS-10943 issue: > 1. {{logSyncAll()}} (statement A in the code quoted below) doesn't flush all > requested txs for some reason > 2. {{logSyncAll()}} does flush all requested txs, but some new txs sneaked > in between A and B. It's observed that the lastWrittenTxId in B and C are the > same. > This proposed reporting would help confirming if 2 is true. > {code} > public synchronized void endCurrentLogSegment(boolean writeEndTxn) { > LOG.info("Ending log segment " + curSegmentTxId); > Preconditions.checkState(isSegmentOpen(), > "Bad state: %s", state); > if (writeEndTxn) { > logEdit(LogSegmentOp.getInstance(cache.get(), > FSEditLogOpCodes.OP_END_LOG_SEGMENT)); > } > // always sync to ensure all edits are flushed. > A.logSyncAll(); > B.printStatistics(true); > final long lastTxId = getLastWrittenTxId(); > try { > C. journalSet.finalizeLogSegment(curSegmentTxId, lastTxId); > editLogStream = null; > } catch (IOException e) { > //All journals have failed, it will be handled in logSync. > } > state = State.BETWEEN_LOG_SEGMENTS; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813259#comment-15813259 ] Hadoop QA commented on HDFS-9391: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 276 unchanged - 1 fixed = 276 total (was 277) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 43s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}104m 30s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.server.namenode.TestStartup | | Timed out junit tests | org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-9391 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846427/HDFS-9391.04.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 1c07ed9ae4f4 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 91bf504 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/18119/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18119/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18119/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues
[jira] [Commented] (HDFS-11306) Print remaining edit logs from buffer if edit log can't be rolled.
[ https://issues.apache.org/jira/browse/HDFS-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813250#comment-15813250 ] Hadoop QA commented on HDFS-11306: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 27s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 45s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}121m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestErasureCodeBenchmarkThroughput | | Timed out junit tests | org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration | | | org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11306 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846419/HDFS-11306.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 908f880b1514 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 91bf504 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/18116/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/18116/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18116/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18116/console | | Powered by | Apache
[jira] [Commented] (HDFS-11150) [SPS]: Provide persistence when satisfying storage policy.
[ https://issues.apache.org/jira/browse/HDFS-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813252#comment-15813252 ] Uma Maheswara Rao G commented on HDFS-11150: [~yuanbo], please proceed with your patch now. HDFS-11293 has committed just few minutes ago. > [SPS]: Provide persistence when satisfying storage policy. > -- > > Key: HDFS-11150 > URL: https://issues.apache.org/jira/browse/HDFS-11150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu > Attachments: HDFS-11150-HDFS-10285.001.patch, > HDFS-11150-HDFS-10285.002.patch, HDFS-11150-HDFS-10285.003.patch, > HDFS-11150-HDFS-10285.004.patch, HDFS-11150-HDFS-10285.005.patch, > editsStored, editsStored.xml > > > Provide persistence for SPS in case that Hadoop cluster crashes by accident. > Basically we need to change EditLog and FsImage here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7967) Reduce the performance impact of the balancer
[ https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-7967: -- Attachment: HDFS-7967.branch-2.002.patch HDFS-7967.branch-2.8.002.patch Use collections.sort and add the remove() method for jdk 7 compliance. Apologies for so many posts, having difficulty getting local env to actually use jdk 7 for compiling. > Reduce the performance impact of the balancer > - > > Key: HDFS-7967 > URL: https://issues.apache.org/jira/browse/HDFS-7967 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-7967-branch-2.8.patch, HDFS-7967-branch-2.patch, > HDFS-7967.branch-2-1.patch, HDFS-7967.branch-2.001.patch, > HDFS-7967.branch-2.002.patch, HDFS-7967.branch-2.8-1.patch, > HDFS-7967.branch-2.8.001.patch, HDFS-7967.branch-2.8.002.patch > > > The balancer needs to query for blocks to move from overly full DNs. The > block lookup is extremely inefficient. An iterator of the node's blocks is > created from the iterators of its storages' blocks. A random number is > chosen corresponding to how many blocks will be skipped via the iterator. > Each skip requires costly scanning of triplets. > The current design also only considers node imbalances while ignoring > imbalances within the nodes's storages. A more efficient and intelligent > design may eliminate the costly skipping of blocks via round-robin selection > of blocks from the storages based on remaining capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11299) Support multiple Datanode File IO hooks
[ https://issues.apache.org/jira/browse/HDFS-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813212#comment-15813212 ] Hadoop QA commented on HDFS-11299: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 3s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 45s{color} | {color:orange} root: The patch generated 1 new + 578 unchanged - 0 fixed = 579 total (was 578) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 55s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 5s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}135m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11299 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846413/HDFS-11299.002.patch | | Optional Tests | asflicense mvnsite compile javac javadoc mvninstall unit findbugs checkstyle | | uname | Linux 8cb3224826df 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 91bf504 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/18115/artifact/patchprocess/diff-checkstyle-root.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/18115/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18115/testReport/ | | modules | C: hadoop-common-project/hadoop-common hadoop-hdf
[jira] [Commented] (HDFS-11209) SNN can't checkpoint when rolling upgrade is not finalized
[ https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813207#comment-15813207 ] Hadoop QA commented on HDFS-11209: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 26s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 92m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11209 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846426/HDFS-11209.02.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 7f8814d53b92 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 91bf504 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/18118/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18118/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18118/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > SNN can't checkpoint when rolling upgrade is not finalized > -- > > Key: HDFS-11209 > URL: https://issues.apache.org/jira/browse/HDFS-11209 >
[jira] [Updated] (HDFS-11305) libhdfs++: Log Datanode information when reading an HDFS block
[ https://issues.apache.org/jira/browse/HDFS-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaowei Zhu updated HDFS-11305: --- Attachment: HDFS-11305.HDFS-8707.001.patch Cleaned up white spaces. > libhdfs++: Log Datanode information when reading an HDFS block > -- > > Key: HDFS-11305 > URL: https://issues.apache.org/jira/browse/HDFS-11305 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu >Assignee: Xiaowei Zhu >Priority: Minor > Attachments: HDFS-11305.HDFS-8707.000.patch, > HDFS-11305.HDFS-8707.001.patch > > > The information can be logged as debug log and contain hostname, ip address, > with file path and offset. With these information, we can check things like > rack locality, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class
[ https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813155#comment-15813155 ] Hadoop QA commented on HDFS-11273: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 7 new + 124 unchanged - 9 fixed = 131 total (was 133) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 46s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}112m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 | | Timed out junit tests | org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11273 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846411/HDFS-11273.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 545690ddc75b 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 91bf504 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/18114/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/18114/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18114/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18114/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT h
[jira] [Commented] (HDFS-11305) libhdfs++: Log Datanode information when reading an HDFS block
[ https://issues.apache.org/jira/browse/HDFS-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813153#comment-15813153 ] James Clampffer commented on HDFS-11305: Code looks good to me, would you mind just clearing up whatever whitespace issue CI is warning about? > libhdfs++: Log Datanode information when reading an HDFS block > -- > > Key: HDFS-11305 > URL: https://issues.apache.org/jira/browse/HDFS-11305 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu >Assignee: Xiaowei Zhu >Priority: Minor > Attachments: HDFS-11305.HDFS-8707.000.patch > > > The information can be logged as debug log and contain hostname, ip address, > with file path and offset. With these information, we can check things like > rack locality, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11305) libhdfs++: Log Datanode information when reading an HDFS block
[ https://issues.apache.org/jira/browse/HDFS-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813143#comment-15813143 ] Hadoop QA commented on HDFS-11305: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 47s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 19s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 49s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 53s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 18s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 32s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 36s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 2s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.7.0_121. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 75m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:78fc6b6 | | JIRA Issue | HDFS-11305 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846421/HDFS-11305.HDFS-8707.000.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux e6193ddc9990 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / 2ceec2b | | Default Java | 1.7.0_121 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/18117/artifact/patchprocess/whitespace-eol.txt | | JDK v1.7.0_121 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18117/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18117/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > libhdfs++: Log Datanode information when reading an HDFS block > -- > > Key: HDFS-11305 > URL: https://issues.apache.o
[jira] [Commented] (HDFS-11308) NameNode doFence state judgment problem
[ https://issues.apache.org/jira/browse/HDFS-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813132#comment-15813132 ] Wei-Chiu Chuang commented on HDFS-11308: (and more portable) > NameNode doFence state judgment problem > --- > > Key: HDFS-11308 > URL: https://issues.apache.org/jira/browse/HDFS-11308 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.1 > Environment: CentOS Linux release 7.1.1503 (Core) >Reporter: tangshangwen > > In our Cluster, I found some abnormal in ZKFC log > {noformat} > [2017-01-10T01:42:37.168+08:00] [INFO] > hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health > Monitor for NameNode at > xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : > Indeterminate response from trying to kill service. Verifying whether it is > running using nc... > [2017-01-10T01:42:37.234+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' > [2017-01-10T01:42:37.235+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat > for more information, usage options and help. QUITTING. > {noformat} > When I perform nc an exception occurs, the return value is 2, and cannot > confirm sshfence success,this may lead to some problems > {code:title=SshFenceByTcpPort.java|borderStyle=solid} > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11308) NameNode doFence state judgment problem
[ https://issues.apache.org/jira/browse/HDFS-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813129#comment-15813129 ] tangshangwen commented on HDFS-11308: - ok, thank you for your comments! > NameNode doFence state judgment problem > --- > > Key: HDFS-11308 > URL: https://issues.apache.org/jira/browse/HDFS-11308 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.1 > Environment: CentOS Linux release 7.1.1503 (Core) >Reporter: tangshangwen > > In our Cluster, I found some abnormal in ZKFC log > {noformat} > [2017-01-10T01:42:37.168+08:00] [INFO] > hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health > Monitor for NameNode at > xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : > Indeterminate response from trying to kill service. Verifying whether it is > running using nc... > [2017-01-10T01:42:37.234+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' > [2017-01-10T01:42:37.235+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat > for more information, usage options and help. QUITTING. > {noformat} > When I perform nc an exception occurs, the return value is 2, and cannot > confirm sshfence success,this may lead to some problems > {code:title=SshFenceByTcpPort.java|borderStyle=solid} > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11308) NameNode doFence state judgment problem
[ https://issues.apache.org/jira/browse/HDFS-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813118#comment-15813118 ] Wei-Chiu Chuang commented on HDFS-11308: I don't know if there's standard/specification for command switches in Linux. But I am inclined to use a configuration property to replace the hard coded "nc -z" command, so that this kind of issue is easier to workaround. > NameNode doFence state judgment problem > --- > > Key: HDFS-11308 > URL: https://issues.apache.org/jira/browse/HDFS-11308 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.1 > Environment: CentOS Linux release 7.1.1503 (Core) >Reporter: tangshangwen > > In our Cluster, I found some abnormal in ZKFC log > {noformat} > [2017-01-10T01:42:37.168+08:00] [INFO] > hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health > Monitor for NameNode at > xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : > Indeterminate response from trying to kill service. Verifying whether it is > running using nc... > [2017-01-10T01:42:37.234+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' > [2017-01-10T01:42:37.235+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat > for more information, usage options and help. QUITTING. > {noformat} > When I perform nc an exception occurs, the return value is 2, and cannot > confirm sshfence success,this may lead to some problems > {code:title=SshFenceByTcpPort.java|borderStyle=solid} > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11308) NameNode doFence state judgment problem
[ https://issues.apache.org/jira/browse/HDFS-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813108#comment-15813108 ] Masatake Iwasaki commented on HDFS-11308: - nc is alias of ncat which does not have "-z" option in CentOS 7. There seems to be some replacement by using bash. http://stackoverflow.com/questions/4922943/test-from-shell-script-if-remote-tcp-port-is-open > NameNode doFence state judgment problem > --- > > Key: HDFS-11308 > URL: https://issues.apache.org/jira/browse/HDFS-11308 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.1 > Environment: CentOS Linux release 7.1.1503 (Core) >Reporter: tangshangwen > > In our Cluster, I found some abnormal in ZKFC log > {noformat} > [2017-01-10T01:42:37.168+08:00] [INFO] > hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health > Monitor for NameNode at > xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : > Indeterminate response from trying to kill service. Verifying whether it is > running using nc... > [2017-01-10T01:42:37.234+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' > [2017-01-10T01:42:37.235+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat > for more information, usage options and help. QUITTING. > {noformat} > When I perform nc an exception occurs, the return value is 2, and cannot > confirm sshfence success,this may lead to some problems > {code:title=SshFenceByTcpPort.java|borderStyle=solid} > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11308) NameNode doFence state judgment problem
[ https://issues.apache.org/jira/browse/HDFS-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813104#comment-15813104 ] Masatake Iwasaki commented on HDFS-11308: - Since "nc -z" is used to check that the process is still alive after failing to kill process by "fuser -k", you should check the reason of the failure. SSH user must be the user running namenode or root in order to make fuser work. > NameNode doFence state judgment problem > --- > > Key: HDFS-11308 > URL: https://issues.apache.org/jira/browse/HDFS-11308 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.1 > Environment: CentOS Linux release 7.1.1503 (Core) >Reporter: tangshangwen > > In our Cluster, I found some abnormal in ZKFC log > {noformat} > [2017-01-10T01:42:37.168+08:00] [INFO] > hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health > Monitor for NameNode at > xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : > Indeterminate response from trying to kill service. Verifying whether it is > running using nc... > [2017-01-10T01:42:37.234+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' > [2017-01-10T01:42:37.235+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat > for more information, usage options and help. QUITTING. > {noformat} > When I perform nc an exception occurs, the return value is 2, and cannot > confirm sshfence success,this may lead to some problems > {code:title=SshFenceByTcpPort.java|borderStyle=solid} > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11292) log lastWrittenTxId etc info in logSyncAll
[ https://issues.apache.org/jira/browse/HDFS-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813078#comment-15813078 ] Wei-Chiu Chuang commented on HDFS-11292: +1. The failed tests cannot be reproduced in my local tree. Thanks [~yzhangal]! > log lastWrittenTxId etc info in logSyncAll > -- > > Key: HDFS-11292 > URL: https://issues.apache.org/jira/browse/HDFS-11292 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-11292.001.patch, HDFS-11292.002.patch, > HDFS-11292.003.patch > > > For the issue reported in HDFS-10943, even after HDFS-7964's fix is included, > the problem still exists, this means there might be some synchronization > issue. > To diagnose that, create this jira to report the lastWrittenTxId info in > {{logSyncAll()}} call, such that we can compare against the error message > reported in HDFS-7964 > Specifically, there is two possibility for the HDFS-10943 issue: > 1. {{logSyncAll()}} (statement A in the code quoted below) doesn't flush all > requested txs for some reason > 2. {{logSyncAll()}} does flush all requested txs, but some new txs sneaked > in between A and B. It's observed that the lastWrittenTxId in B and C are the > same. > This proposed reporting would help confirming if 2 is true. > {code} > public synchronized void endCurrentLogSegment(boolean writeEndTxn) { > LOG.info("Ending log segment " + curSegmentTxId); > Preconditions.checkState(isSegmentOpen(), > "Bad state: %s", state); > if (writeEndTxn) { > logEdit(LogSegmentOp.getInstance(cache.get(), > FSEditLogOpCodes.OP_END_LOG_SEGMENT)); > } > // always sync to ensure all edits are flushed. > A.logSyncAll(); > B.printStatistics(true); > final long lastTxId = getLastWrittenTxId(); > try { > C. journalSet.finalizeLogSegment(curSegmentTxId, lastTxId); > editLogStream = null; > } catch (IOException e) { > //All journals have failed, it will be handled in logSync. > } > state = State.BETWEEN_LOG_SEGMENTS; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11293) [SPS]: Local DN should be given preference as source node, when target available in same node
[ https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-11293: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-10285 Status: Resolved (was: Patch Available) > [SPS]: Local DN should be given preference as source node, when target > available in same node > - > > Key: HDFS-11293 > URL: https://issues.apache.org/jira/browse/HDFS-11293 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Yuanbo Liu >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: HDFS-10285 > > Attachments: HDFS-11293-HDFS-10285-00.patch, > HDFS-11293-HDFS-10285-01.patch, HDFS-11293-HDFS-10285-02.patch > > > In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica > info by block pool id. But in this situation: > {code} > datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}. > 1. the same block replica exists in A[DISK] and B[DISK]. > 2. the block pool id of datanode A and datanode B are the same. > {code} > Then we start to change the file's storage policy and move the block replica > in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at > this time, datanode A throws ReplicaAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11293) [SPS]: Local DN should be given preference as source node, when target available in same node
[ https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813069#comment-15813069 ] Uma Maheswara Rao G commented on HDFS-11293: I have just committed this to branch. Thanks [~yuanbo] and [~rakeshr] for reviews! Thanks [~yuanbo] for finding issue and sharing test cases. > [SPS]: Local DN should be given preference as source node, when target > available in same node > - > > Key: HDFS-11293 > URL: https://issues.apache.org/jira/browse/HDFS-11293 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Yuanbo Liu >Assignee: Uma Maheswara Rao G >Priority: Critical > Attachments: HDFS-11293-HDFS-10285-00.patch, > HDFS-11293-HDFS-10285-01.patch, HDFS-11293-HDFS-10285-02.patch > > > In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica > info by block pool id. But in this situation: > {code} > datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}. > 1. the same block replica exists in A[DISK] and B[DISK]. > 2. the block pool id of datanode A and datanode B are the same. > {code} > Then we start to change the file's storage policy and move the block replica > in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at > this time, datanode A throws ReplicaAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11028) libhdfs++: FileHandleImpl::CancelOperations needs to be able to cancel pending connections
[ https://issues.apache.org/jira/browse/HDFS-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813059#comment-15813059 ] James Clampffer commented on HDFS-11028: In case we needed another example of why "delete this" and tenuous resource management in C++ are a recipe for pain: it looks like this can leak memory if the FileSystem destructor is called while this waits on the asio dns resolver. The bug existed before this patch but the cancel test executable in my patch provides a simple reproducer. Situation: In common/namenode_info.cc BulkResolve takes a set of NamenodeInfo objects and does a DNS lookup on each host to get a vector of endpoints. In order to be fast the function does an arbitrary amount of async lookups in parallel and joins at the end to make the API reasonably simple to use. In order to keep track of multiple pipelines a vector of std::pair*, std::shared_ptr>> is set up. Each pair represents a continuation pipeline that's doing the resolve work and the std::promise that will eventually contain the result status assuming the continuation runs to completion. This seemed like a reasonable way to encapsulate async work using continuations that needed to be joined but it turns out it's incredibly difficult to clean this up if it's been interrupted. -Can't simply call delete on the Pipeline pointers contained in the vector because the continuation may have already called "delete this", if it has self-destructed the pointer remains non-null so double deleting will break things. -Can't loop though the vector can call cancel on all the Pipelines because some may have been destructed via "delete this". If the malloc implementation is being generous the call might give __cxa_pure_virtual exception but it's more likely to just trash the heap. -Can't check the status of the Pipeline because it's wrapped in a promise, so that will just block. Possible fixes: -Add a pointer-to-a-flag to the continuation state so the pipeline can indicate it self destructed, make sure the ResolveContinuation can actually deal with cancel semantics. -Rewrite dns lookup by allocating memory correctly and calling asio functions. I'll file another jira to rewrite the dns resolution code as I don't think an issue that's been around for so long should block this. The temporary fix is to avoid deleting the FileSystem object immediately after cancel. The pipeline will clean itself up when the resolver returns, but it risks invalid writes if the vector of endpoints disappears since it's holding a back_inserter i.e. it's a dangling pointer issue obfuscated by piles of abstraction. > libhdfs++: FileHandleImpl::CancelOperations needs to be able to cancel > pending connections > -- > > Key: HDFS-11028 > URL: https://issues.apache.org/jira/browse/HDFS-11028 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-11028.HDFS-8707.000.patch, > HDFS-11028.HDFS-8707.001.patch, HDFS-11028.HDFS-8707.002.patch > > > Cancel support is now reasonably robust except the case where a FileHandle > operation ends up causing the RpcEngine to try to create a new RpcConnection. > In HA configs it's common to have something like 10-20 failovers and a 20 > second failover delay (no exponential backoff just yet). This means that all > of the functions with synchronous interfaces can still block for many minutes > after an operation has been canceled, and often the cause of this is > something trivial like a bad config file. > The current design makes this sort of thing tricky to do because the > FileHandles need to be individually cancelable via CancelOperations, but they > share the RpcEngine that does the async magic. > Updated design: > Original design would end up forcing lots of reconnects. Not a huge issue on > an unauthenticated cluster but on a kerberized cluster this is a recipe for > Kerberos thinking we're attempting a replay attack. > User visible cancellation and internal resources cleanup are separable > issues. The former can be implemented by atomically swapping the callback of > the operation to be canceled with a no-op callback. The original callback is > then posted to the IoService with an OperationCanceled status and the user is > no longer blocked. For RPC cancels this is sufficient, it's not expensive to > keep a request around a little bit longer and when it's eventually invoked or > timed out it invokes the no-op callback and is ignored (other than a trace > level log notification). Connect cancels push a flag down into the RPC > engine to kill the connection and make sure it doesn't attempt to reconnect.
[jira] [Commented] (HDFS-11308) NameNode doFence state judgment problem
[ https://issues.apache.org/jira/browse/HDFS-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813039#comment-15813039 ] tangshangwen commented on HDFS-11308: - Hi [~jojochuang], Do you think we should judge the return value is 1 to avoid this problem? > NameNode doFence state judgment problem > --- > > Key: HDFS-11308 > URL: https://issues.apache.org/jira/browse/HDFS-11308 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.1 > Environment: CentOS Linux release 7.1.1503 (Core) >Reporter: tangshangwen > > In our Cluster, I found some abnormal in ZKFC log > {noformat} > [2017-01-10T01:42:37.168+08:00] [INFO] > hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health > Monitor for NameNode at > xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : > Indeterminate response from trying to kill service. Verifying whether it is > running using nc... > [2017-01-10T01:42:37.234+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' > [2017-01-10T01:42:37.235+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat > for more information, usage options and help. QUITTING. > {noformat} > When I perform nc an exception occurs, the return value is 2, and cannot > confirm sshfence success,this may lead to some problems > {code:title=SshFenceByTcpPort.java|borderStyle=solid} > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11308) NameNode doFence state judgment problem
[ https://issues.apache.org/jira/browse/HDFS-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813035#comment-15813035 ] tangshangwen commented on HDFS-11308: - Ok, thank you ~ [~jojochuang] > NameNode doFence state judgment problem > --- > > Key: HDFS-11308 > URL: https://issues.apache.org/jira/browse/HDFS-11308 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.1 > Environment: CentOS Linux release 7.1.1503 (Core) >Reporter: tangshangwen > > In our Cluster, I found some abnormal in ZKFC log > {noformat} > [2017-01-10T01:42:37.168+08:00] [INFO] > hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health > Monitor for NameNode at > xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : > Indeterminate response from trying to kill service. Verifying whether it is > running using nc... > [2017-01-10T01:42:37.234+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' > [2017-01-10T01:42:37.235+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat > for more information, usage options and help. QUITTING. > {noformat} > When I perform nc an exception occurs, the return value is 2, and cannot > confirm sshfence success,this may lead to some problems > {code:title=SshFenceByTcpPort.java|borderStyle=solid} > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7967) Reduce the performance impact of the balancer
[ https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813031#comment-15813031 ] Hadoop QA commented on HDFS-7967: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 39s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 57s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 25s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 26s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_121. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 26s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_121. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 366 unchanged - 9 fixed = 371 total (was 375) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 26s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 22s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 29s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_121. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}113m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_111 Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:b59b8b7 | | JIRA Issue | HDFS-7967 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846404/HDFS-7967.branch-2.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 357bacb0b6a2 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/pe
[jira] [Commented] (HDFS-11308) NameNode doFence state judgment problem
[ https://issues.apache.org/jira/browse/HDFS-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813026#comment-15813026 ] Wei-Chiu Chuang commented on HDFS-11308: I don't have a CentOS 7 box available. But it looks like a platform issue. The same bug was reported here: https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1349384 Looks like the workaround is to use nmap. > NameNode doFence state judgment problem > --- > > Key: HDFS-11308 > URL: https://issues.apache.org/jira/browse/HDFS-11308 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.1 > Environment: CentOS Linux release 7.1.1503 (Core) >Reporter: tangshangwen > > In our Cluster, I found some abnormal in ZKFC log > {noformat} > [2017-01-10T01:42:37.168+08:00] [INFO] > hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health > Monitor for NameNode at > xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : > Indeterminate response from trying to kill service. Verifying whether it is > running using nc... > [2017-01-10T01:42:37.234+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' > [2017-01-10T01:42:37.235+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat > for more information, usage options and help. QUITTING. > {noformat} > When I perform nc an exception occurs, the return value is 2, and cannot > confirm sshfence success,this may lead to some problems > {code:title=SshFenceByTcpPort.java|borderStyle=solid} > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11299) Support multiple Datanode File IO hooks
[ https://issues.apache.org/jira/browse/HDFS-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813013#comment-15813013 ] Xiaoyu Yao commented on HDFS-11299: --- Thanks [~hanishakoneru] for updating the patch and [~arpitagarwal] for the review. Looks like LEN_INT can be replaced with Integer.BYTES from JDK. But Integer.BYTES is only available in JDK1.8. It will cause more work when backporting it to branch-2. I'm OK with the patch v02. +1 pending Jenkins. > Support multiple Datanode File IO hooks > --- > > Key: HDFS-11299 > URL: https://issues.apache.org/jira/browse/HDFS-11299 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11299.000.patch, HDFS-11299.001.patch, > HDFS-11299.002.patch > > > HDFS-10958 introduces instrumentation hooks around DataNode disk IO and > HDFS-10959 adds support for profiling hooks to expose latency statistics. > Instead of choosing only one hook using Config parameters, we want to add two > separate hooks - one for profiling and one for fault injection. The fault > injection hook will be useful for testing purposes. > This jira only introduces support for fault injection hook. The > implementation for that will come later on. > Also, now Default and Counting FileIOEvents would not be needed as we can > control enabling the profiling and fault injection hooks using config > parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-9391: - Attachment: HDFS-9391.04.patch Attached v04 patch to fix the LeavingServiceStatus and use outOfServiceReplica block counts in DN UI pages. [~mingma], [~eddyxu] can you please take a look at this patch revision ? > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance > webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11308) NameNode doFence state judgment problem
[ https://issues.apache.org/jira/browse/HDFS-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tangshangwen updated HDFS-11308: Description: In our Cluster, I found some abnormal in ZKFC log {noformat} [2017-01-10T01:42:37.168+08:00] [INFO] hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health Monitor for NameNode at xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : Indeterminate response from trying to kill service. Verifying whether it is running using nc... [2017-01-10T01:42:37.234+08:00] [WARN] hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' [2017-01-10T01:42:37.235+08:00] [WARN] hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat for more information, usage options and help. QUITTING. {noformat} When I perform nc an exception occurs, the return value is 2, and cannot confirm sshfence success,this may lead to some problems {code:title=SshFenceByTcpPort.java|borderStyle=solid} rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + " " + serviceAddr.getPort()); if (rc == 0) { // the service is still listening - we are unable to fence LOG.warn("Unable to fence - it is running but we cannot kill it"); return false; } else { LOG.info("Verified that the service is down."); return true; } {code} was: In our Cluster, I found some abnormal in ZKFC log {noformat} [2017-01-10T01:42:37.168+08:00] [INFO] hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health Monitor for NameNode at xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : Indeterminate response from trying to kill service. Verifying whether it is running using nc... [2017-01-10T01:42:37.234+08:00] [WARN] hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z BJYF-Druid-17224.hadoop.jd.local 8021 via ssh: StreamPumper for STDERR] : nc -z xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' [2017-01-10T01:42:37.235+08:00] [WARN] hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z BJYF-Druid-17224.hadoop.jd.local 8021 via ssh: Ncat: Try `--help' or man(1) ncat for more information, usage options and help. QUITTING. {noformat} When I perform nc an exception occurs, the return value is 2, and cannot confirm sshfence success,this may lead to some problems {code:title=SshFenceByTcpPort.java|borderStyle=solid} rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + " " + serviceAddr.getPort()); if (rc == 0) { // the service is still listening - we are unable to fence LOG.warn("Unable to fence - it is running but we cannot kill it"); return false; } else { LOG.info("Verified that the service is down."); return true; } {code} > NameNode doFence state judgment problem > --- > > Key: HDFS-11308 > URL: https://issues.apache.org/jira/browse/HDFS-11308 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.1 > Environment: CentOS Linux release 7.1.1503 (Core) >Reporter: tangshangwen > > In our Cluster, I found some abnormal in ZKFC log > {noformat} > [2017-01-10T01:42:37.168+08:00] [INFO] > hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health > Monitor for NameNode at > xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : > Indeterminate response from trying to kill service. Verifying whether it is > running using nc... > [2017-01-10T01:42:37.234+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' > [2017-01-10T01:42:37.235+08:00] [WARN] > hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z > xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z > xxx-xxx-17224.hadoop.xxx.com 8021 via ssh: Ncat: Try `--help' or man(1) ncat > for more information, usage options and help. QUITTING. > {noformat} > When I perform nc an exception occurs, the return value is 2, and cannot > confirm sshfence success,this may lead to some problems > {code:title=SshFenceByTcpPort.java|borderStyle=solid} > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAd
[jira] [Created] (HDFS-11308) NameNode doFence state judgment problem
tangshangwen created HDFS-11308: --- Summary: NameNode doFence state judgment problem Key: HDFS-11308 URL: https://issues.apache.org/jira/browse/HDFS-11308 Project: Hadoop HDFS Issue Type: Bug Components: auto-failover Affects Versions: 2.7.1 Environment: CentOS Linux release 7.1.1503 (Core) Reporter: tangshangwen In our Cluster, I found some abnormal in ZKFC log {noformat} [2017-01-10T01:42:37.168+08:00] [INFO] hadoop.ha.SshFenceByTcpPort.doFence(SshFenceByTcpPort.java 147) [Health Monitor for NameNode at xxx-xxx-172xxx.hadoop.xxx.com/xxx.xxx.172.xxx:8021-EventThread] : Indeterminate response from trying to kill service. Verifying whether it is running using nc... [2017-01-10T01:42:37.234+08:00] [WARN] hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z BJYF-Druid-17224.hadoop.jd.local 8021 via ssh: StreamPumper for STDERR] : nc -z xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: nc: invalid option -- 'z' [2017-01-10T01:42:37.235+08:00] [WARN] hadoop.ha.SshFenceByTcpPort.pump(StreamPumper.java 88) [nc -z xxx-xxx-172xx.hadoop.xxx.com 8021 via ssh: StreamPumper for STDERR] : nc -z BJYF-Druid-17224.hadoop.jd.local 8021 via ssh: Ncat: Try `--help' or man(1) ncat for more information, usage options and help. QUITTING. {noformat} When I perform nc an exception occurs, the return value is 2, and cannot confirm sshfence success,this may lead to some problems {code:title=SshFenceByTcpPort.java|borderStyle=solid} rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + " " + serviceAddr.getPort()); if (rc == 0) { // the service is still listening - we are unable to fence LOG.warn("Unable to fence - it is running but we cannot kill it"); return false; } else { LOG.info("Verified that the service is down."); return true; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11307) The rpc to portmap service for NFS has hardcoded timeout.
Jitendra Nath Pandey created HDFS-11307: --- Summary: The rpc to portmap service for NFS has hardcoded timeout. Key: HDFS-11307 URL: https://issues.apache.org/jira/browse/HDFS-11307 Project: Hadoop HDFS Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Mukul Kumar Singh The NFS service makes rpc call to the portmap but the timeout is hardcoded. Tests on slow virtual machines sometimes fail due to timeout. We should make the timeout configurable, with the same default value as current value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11209) SNN can't checkpoint when rolling upgrade is not finalized
[ https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812983#comment-15812983 ] Xiaoyu Yao commented on HDFS-11209: --- I've also manually tested on a cluster with rolling upgrade (non-HA) from NN layout version 60 to 63 and verify the patch fixed the checkpoint problem on SNN. > SNN can't checkpoint when rolling upgrade is not finalized > -- > > Key: HDFS-11209 > URL: https://issues.apache.org/jira/browse/HDFS-11209 > Project: Hadoop HDFS > Issue Type: Bug > Components: rolling upgrades >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Critical > Attachments: HDFS-11209.00.patch, HDFS-11209.01.patch, > HDFS-11209.02.patch > > > Similar problem has been fixed with HDFS-7185. Recent change in HDFS-8432 > brings this back. > With HDFS-8432, the primary NN will not update the VERSION file to the new > version after running with "rollingUpgrade" option until upgrade is > finalized. This is to support more downgrade use cases. > However, the checkpoint on the SNN is incorrectly updating the VERSION file > when the rollingUpgrade is not finalized yet. As a result, the SNN checkpoint > successfully but fail to push it to the primary NN because its version is > higher than the primary NN as shown below. > {code} > 2016-12-02 05:25:31,918 ERROR namenode.SecondaryNameNode > (SecondaryNameNode.java:doWork(399)) - Exception in doCheckpoint > org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException: > Image uploading failed, status: 403, url: > http://NN:50070/imagetransfer?txid=345404754&imageFile=IMAGE&File-Le..., > message: This namenode has storage info -60:221856466:1444080250181:clusterX > but the secondary expected -63:221856466:1444080250181:clusterX > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11209) SNN can't checkpoint when rolling upgrade is not finalized
[ https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-11209: -- Attachment: HDFS-11209.02.patch Fix the checkstyle and adding unit test. > SNN can't checkpoint when rolling upgrade is not finalized > -- > > Key: HDFS-11209 > URL: https://issues.apache.org/jira/browse/HDFS-11209 > Project: Hadoop HDFS > Issue Type: Bug > Components: rolling upgrades >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Critical > Attachments: HDFS-11209.00.patch, HDFS-11209.01.patch, > HDFS-11209.02.patch > > > Similar problem has been fixed with HDFS-7185. Recent change in HDFS-8432 > brings this back. > With HDFS-8432, the primary NN will not update the VERSION file to the new > version after running with "rollingUpgrade" option until upgrade is > finalized. This is to support more downgrade use cases. > However, the checkpoint on the SNN is incorrectly updating the VERSION file > when the rollingUpgrade is not finalized yet. As a result, the SNN checkpoint > successfully but fail to push it to the primary NN because its version is > higher than the primary NN as shown below. > {code} > 2016-12-02 05:25:31,918 ERROR namenode.SecondaryNameNode > (SecondaryNameNode.java:doWork(399)) - Exception in doCheckpoint > org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException: > Image uploading failed, status: 403, url: > http://NN:50070/imagetransfer?txid=345404754&imageFile=IMAGE&File-Le..., > message: This namenode has storage info -60:221856466:1444080250181:clusterX > but the secondary expected -63:221856466:1444080250181:clusterX > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11299) Support multiple Datanode File IO hooks
[ https://issues.apache.org/jira/browse/HDFS-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812955#comment-15812955 ] Arpit Agarwal commented on HDFS-11299: -- Thanks [~hanishakoneru]. +1 pending Jenkins. Looks like Xiaoyu's feedback is also addressed but will hold off committing until tomorrow in case he has more feedback. > Support multiple Datanode File IO hooks > --- > > Key: HDFS-11299 > URL: https://issues.apache.org/jira/browse/HDFS-11299 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11299.000.patch, HDFS-11299.001.patch, > HDFS-11299.002.patch > > > HDFS-10958 introduces instrumentation hooks around DataNode disk IO and > HDFS-10959 adds support for profiling hooks to expose latency statistics. > Instead of choosing only one hook using Config parameters, we want to add two > separate hooks - one for profiling and one for fault injection. The fault > injection hook will be useful for testing purposes. > This jira only introduces support for fault injection hook. The > implementation for that will come later on. > Also, now Default and Counting FileIOEvents would not be needed as we can > control enabling the profiling and fault injection hooks using config > parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11306) Print remaining edit logs from buffer if edit log can't be rolled.
[ https://issues.apache.org/jira/browse/HDFS-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-11306: --- Summary: Print remaining edit logs from buffer if edit log can't be rolled. (was: Dump remaining edit logs from buffer if edit log can't be rolled.) > Print remaining edit logs from buffer if edit log can't be rolled. > -- > > Key: HDFS-11306 > URL: https://issues.apache.org/jira/browse/HDFS-11306 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11306.001.patch > > > In HDFS-10943 [~yzhangal] reported that edit log can not be rolled due to > unexpected edit logs lingering in the buffer. > Unable to root cause the bug, I propose that we dump the remaining edit logs > in the buffer into namenode log, before crashing namenode. Use this new > capability to find the ops that sneaks into the buffer unexpectedly, and > hopefully catch the bug. > This effort is orthogonal, but related to HDFS-11292, which adds additional > informational logs to help debug this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11305) libhdfs++: Log Datanode information when reading an HDFS block
[ https://issues.apache.org/jira/browse/HDFS-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaowei Zhu updated HDFS-11305: --- Attachment: HDFS-11305.HDFS-8707.000.patch 000.patch adds a debug log in FileHandleImpl::AsyncPreadSome to display datanode hostname and ip address, file path and offset of an HDFS block. > libhdfs++: Log Datanode information when reading an HDFS block > -- > > Key: HDFS-11305 > URL: https://issues.apache.org/jira/browse/HDFS-11305 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu >Assignee: Xiaowei Zhu >Priority: Minor > Attachments: HDFS-11305.HDFS-8707.000.patch > > > The information can be logged as debug log and contain hostname, ip address, > with file path and offset. With these information, we can check things like > rack locality, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11305) libhdfs++: Log Datanode information when reading an HDFS block
[ https://issues.apache.org/jira/browse/HDFS-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaowei Zhu updated HDFS-11305: --- Status: Patch Available (was: Open) > libhdfs++: Log Datanode information when reading an HDFS block > -- > > Key: HDFS-11305 > URL: https://issues.apache.org/jira/browse/HDFS-11305 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu >Assignee: Xiaowei Zhu >Priority: Minor > > The information can be logged as debug log and contain hostname, ip address, > with file path and offset. With these information, we can check things like > rack locality, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11306) Dump remaining edit logs from buffer if edit log can't be rolled.
[ https://issues.apache.org/jira/browse/HDFS-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-11306: --- Status: Patch Available (was: Open) > Dump remaining edit logs from buffer if edit log can't be rolled. > - > > Key: HDFS-11306 > URL: https://issues.apache.org/jira/browse/HDFS-11306 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11306.001.patch > > > In HDFS-10943 [~yzhangal] reported that edit log can not be rolled due to > unexpected edit logs lingering in the buffer. > Unable to root cause the bug, I propose that we dump the remaining edit logs > in the buffer into namenode log, before crashing namenode. Use this new > capability to find the ops that sneaks into the buffer unexpectedly, and > hopefully catch the bug. > This effort is orthogonal, but related to HDFS-11292, which adds additional > informational logs to help debug this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11306) Dump remaining edit logs from buffer if edit log can't be rolled.
[ https://issues.apache.org/jira/browse/HDFS-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-11306: --- Attachment: HDFS-11306.001.patch Upload patch v001. This patch adds a private method that dumps edit logs in human readable formation into namenode log. A test case is also added. Any suggestion is greatly appreciated. I honestly do not have much experience in edit logs and HA. > Dump remaining edit logs from buffer if edit log can't be rolled. > - > > Key: HDFS-11306 > URL: https://issues.apache.org/jira/browse/HDFS-11306 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11306.001.patch > > > In HDFS-10943 [~yzhangal] reported that edit log can not be rolled due to > unexpected edit logs lingering in the buffer. > Unable to root cause the bug, I propose that we dump the remaining edit logs > in the buffer into namenode log, before crashing namenode. Use this new > capability to find the ops that sneaks into the buffer unexpectedly, and > hopefully catch the bug. > This effort is orthogonal, but related to HDFS-11292, which adds additional > informational logs to help debug this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11306) Dump remaining edit logs from buffer if edit log can't be rolled.
Wei-Chiu Chuang created HDFS-11306: -- Summary: Dump remaining edit logs from buffer if edit log can't be rolled. Key: HDFS-11306 URL: https://issues.apache.org/jira/browse/HDFS-11306 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang In HDFS-10943 [~yzhangal] reported that edit log can not be rolled due to unexpected edit logs lingering in the buffer. Unable to root cause the bug, I propose that we dump the remaining edit logs in the buffer into namenode log, before crashing namenode. Use this new capability to find the ops that sneaks into the buffer unexpectedly, and hopefully catch the bug. This effort is orthogonal, but related to HDFS-11292, which adds additional informational logs to help debug this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812904#comment-15812904 ] Manoj Govindassamy edited comment on HDFS-9391 at 1/9/17 9:30 PM: -- Sure. Its about whether we should include all blocks of *Maintenance + Decommission* states under "Block with No Live Replicas" for each of DN in the "Entering Maintenance" and "Decommissioning" page. Previously i was trying to have them include only one of these states (as per the initial discussion in this jira). But, thinking more about it and after your discussing i feel including both these states makes sense. Will upload the new patch soon. Thanks a lot for the review. was (Author: manojg): Sure. Its about whether we should include all blocks of *Maintenance + Decommission* states under "Block with No Live Replicas" for each of DN in the "Entering Maintenance" and "Decommissioning" page. Previously i was trying to have them include one of these states. But, thinking more about it and after your discussing i feel including both these states makes sense. Will upload the new patch soon. Thanks a lot for the review. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, HDFS-9391.03.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812904#comment-15812904 ] Manoj Govindassamy commented on HDFS-9391: -- Sure. Its about whether we should include all blocks of *Maintenance + Decommission* states under "Block with No Live Replicas" for each of DN in the "Entering Maintenance" and "Decommissioning" page. Previously i was trying to have them include one of these states. But, thinking more about it and after your discussing i feel including both these states makes sense. Will upload the new patch soon. Thanks a lot for the review. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, HDFS-9391.03.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11209) SNN can't checkpoint when rolling upgrade is not finalized
[ https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812883#comment-15812883 ] Hadoop QA commented on HDFS-11209: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 151 unchanged - 0 fixed = 153 total (was 151) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 0s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}103m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality | | | hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11209 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846397/HDFS-11209.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux aee66256 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 91bf504 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/18112/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/18112/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Updated] (HDFS-11305) libhdfs++: Log Datanode information when reading an HDFS block
[ https://issues.apache.org/jira/browse/HDFS-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaowei Zhu updated HDFS-11305: --- Summary: libhdfs++: Log Datanode information when reading an HDFS block (was: Log Datanode information when reading an HDFS block) > libhdfs++: Log Datanode information when reading an HDFS block > -- > > Key: HDFS-11305 > URL: https://issues.apache.org/jira/browse/HDFS-11305 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu >Assignee: Xiaowei Zhu >Priority: Minor > > The information can be logged as debug log and contain hostname, ip address, > with file path and offset. With these information, we can check things like > rack locality, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11305) Log Datanode information when reading an HDFS block
Xiaowei Zhu created HDFS-11305: -- Summary: Log Datanode information when reading an HDFS block Key: HDFS-11305 URL: https://issues.apache.org/jira/browse/HDFS-11305 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xiaowei Zhu Assignee: Xiaowei Zhu Priority: Minor The information can be logged as debug log and contain hostname, ip address, with file path and offset. With these information, we can check things like rack locality, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11299) Support multiple Datanode File IO hooks
[ https://issues.apache.org/jira/browse/HDFS-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-11299: -- Attachment: HDFS-11299.002.patch Thank you [~xyao] and [~arpitagarwal] for reviewing the patch and for the comments. I have addressed them in patch v02. > Support multiple Datanode File IO hooks > --- > > Key: HDFS-11299 > URL: https://issues.apache.org/jira/browse/HDFS-11299 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11299.000.patch, HDFS-11299.001.patch, > HDFS-11299.002.patch > > > HDFS-10958 introduces instrumentation hooks around DataNode disk IO and > HDFS-10959 adds support for profiling hooks to expose latency statistics. > Instead of choosing only one hook using Config parameters, we want to add two > separate hooks - one for profiling and one for fault injection. The fault > injection hook will be useful for testing purposes. > This jira only introduces support for fault injection hook. The > implementation for that will come later on. > Also, now Default and Counting FileIOEvents would not be needed as we can > control enabling the profiling and fault injection hooks using config > parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class
[ https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-11273: -- Attachment: HDFS-11273.004.patch > Move TransferFsImage#doGetUrl function to a Util class > -- > > Key: HDFS-11273 > URL: https://issues.apache.org/jira/browse/HDFS-11273 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11273.000.patch, HDFS-11273.001.patch, > HDFS-11273.002.patch, HDFS-11273.003.patch, HDFS-11273.004.patch > > > TransferFsImage#doGetUrl downloads files from the specified url and stores > them in the specified storage location. HDFS-4025 plans to synchronize the > log segments in JournalNodes. If a log segment is missing from a JN, the JN > downloads it from another JN which has the required log segment. We need > TransferFsImage#doGetUrl and TransferFsImage#receiveFile to accomplish this. > So we propose to move the said functions to a Utility class so as to be able > to use it for JournalNode syncing as well, without duplication of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11293) [SPS]: Local DN should be given preference as source node, when target available in same node
[ https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812794#comment-15812794 ] Hadoop QA commented on HDFS-11293: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 0s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}151m 53s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}172m 48s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure | | | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | | hadoop.hdfs.server.datanode.TestLargeBlockReport | | | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120 | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 | | | hadoop.hdfs.TestFileChecksum | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11293 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846378/HDFS-11293-HDFS-10285-02.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 5f854f997750 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-10285 / 5aacf2a | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/18109/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18109/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hd
[jira] [Commented] (HDFS-7967) Reduce the performance impact of the balancer
[ https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812791#comment-15812791 ] Allen Wittenauer commented on HDFS-7967: Sorry. :( I tried hard to make hyphens as separators work (and in some cases they do), but other cases were just not reliable enough, even when cross referencing a list of valid branches. (.e.g, is HADOOP-6671-HADOOP-6671-2.patch a patch for the HADOOP-6671 branch or the HADOOP-6671-2 branch? The YARN-5355-branch-2 branch brings a whole new level of hurt...) > Reduce the performance impact of the balancer > - > > Key: HDFS-7967 > URL: https://issues.apache.org/jira/browse/HDFS-7967 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-7967-branch-2.8.patch, HDFS-7967-branch-2.patch, > HDFS-7967.branch-2-1.patch, HDFS-7967.branch-2.001.patch, > HDFS-7967.branch-2.8-1.patch, HDFS-7967.branch-2.8.001.patch > > > The balancer needs to query for blocks to move from overly full DNs. The > block lookup is extremely inefficient. An iterator of the node's blocks is > created from the iterators of its storages' blocks. A random number is > chosen corresponding to how many blocks will be skipped via the iterator. > Each skip requires costly scanning of triplets. > The current design also only considers node imbalances while ignoring > imbalances within the nodes's storages. A more efficient and intelligent > design may eliminate the costly skipping of blocks via round-robin selection > of blocks from the storages based on remaining capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7967) Reduce the performance impact of the balancer
[ https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-7967: -- Attachment: HDFS-7967.branch-2.001.patch HDFS-7967.branch-2.8.001.patch Renamed patches to make pre-commit happ(ier). > Reduce the performance impact of the balancer > - > > Key: HDFS-7967 > URL: https://issues.apache.org/jira/browse/HDFS-7967 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-7967-branch-2.8.patch, HDFS-7967-branch-2.patch, > HDFS-7967.branch-2-1.patch, HDFS-7967.branch-2.001.patch, > HDFS-7967.branch-2.8-1.patch, HDFS-7967.branch-2.8.001.patch > > > The balancer needs to query for blocks to move from overly full DNs. The > block lookup is extremely inefficient. An iterator of the node's blocks is > created from the iterators of its storages' blocks. A random number is > chosen corresponding to how many blocks will be skipped via the iterator. > Each skip requires costly scanning of triplets. > The current design also only considers node imbalances while ignoring > imbalances within the nodes's storages. A more efficient and intelligent > design may eliminate the costly skipping of blocks via round-robin selection > of blocks from the storages based on remaining capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812642#comment-15812642 ] Ming Ma commented on HDFS-9391: --- Thanks Manoj. I just found something related to our discussion. For any decommissioning node, given getDecommissionOnlyReplicas is the same as getOutOfServiceOnlyReplicas, can we just use getOutOfServiceOnlyReplicas value for JSON decommissionOnlyReplicas property? Same for any entering maintenance node. In other words, we might not need to add the extra decommissionOnlyReplicas and maintenanceOnlyReplicas to LeavingServiceStatus. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, HDFS-9391.03.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11304) Namenode fails to start, even edit log available in the journal node
[ https://issues.apache.org/jira/browse/HDFS-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812634#comment-15812634 ] Wei-Chiu Chuang edited comment on HDFS-11304 at 1/9/17 7:38 PM: Thanks for reporting the issue, [~kpalanisamy]. If a NameNode crashes because edit log has a gap, and the gap is not due to some operational error, it can't be just a minor issue. It has to be at least a major one. Bump up the priority. was (Author: jojochuang): Thanks for reporting the issue, [~kpalanisamy]. If a NameNode crashes because edit log has a gap, it can't be just a minor issue. It has to be at least a major one. Bump up the priority. > Namenode fails to start, even edit log available in the journal node > > > Key: HDFS-11304 > URL: https://issues.apache.org/jira/browse/HDFS-11304 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, journal-node >Affects Versions: 2.8.0, 2.7.1 > Environment: *HDP 2.4.2.0-258* >Reporter: Karthik P >Assignee: Karthik P > Labels: patch > > JN => JournalNode > NN => Namenode local directory (_dfs.namenode.name.dir_) > Y/N => Is edit file/log present? > Ex : edits_1627921-1627961 > *Scenario:* > ||JN 1||JN 2||JN 3||NN local|| Is NN started? > |N|N|Y|N|Started| > |Y|N|N|N|Started| > |N|Y|N|N|Failed| > |N|Y|N|Y|Started| > |Y|Y|N|N|Started| > *Note:* Namenode and JN2 installed on the same machine > *Trace :* > ERROR namenode.NameNode (NameNode.java:main(1712)) - Failed to start > namenode. > java.io.IOException: There appears to be a gap in the edit log. We expected > txid 1627921, but got txid 1627962. > at > org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:837) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:692) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:983) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:688) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:662) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:726) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:951) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:935) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1641) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1707) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11304) Namenode fails to start, even edit log available in the journal node
[ https://issues.apache.org/jira/browse/HDFS-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-11304: --- Priority: Major (was: Minor) > Namenode fails to start, even edit log available in the journal node > > > Key: HDFS-11304 > URL: https://issues.apache.org/jira/browse/HDFS-11304 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, journal-node >Affects Versions: 2.8.0, 2.7.1 > Environment: *HDP 2.4.2.0-258* >Reporter: Karthik P >Assignee: Karthik P > Labels: patch > > JN => JournalNode > NN => Namenode local directory (_dfs.namenode.name.dir_) > Y/N => Is edit file/log present? > Ex : edits_1627921-1627961 > *Scenario:* > ||JN 1||JN 2||JN 3||NN local|| Is NN started? > |N|N|Y|N|Started| > |Y|N|N|N|Started| > |N|Y|N|N|Failed| > |N|Y|N|Y|Started| > |Y|Y|N|N|Started| > *Note:* Namenode and JN2 installed on the same machine > *Trace :* > ERROR namenode.NameNode (NameNode.java:main(1712)) - Failed to start > namenode. > java.io.IOException: There appears to be a gap in the edit log. We expected > txid 1627921, but got txid 1627962. > at > org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:837) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:692) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:983) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:688) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:662) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:726) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:951) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:935) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1641) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1707) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11304) Namenode fails to start, even edit log available in the journal node
[ https://issues.apache.org/jira/browse/HDFS-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812634#comment-15812634 ] Wei-Chiu Chuang commented on HDFS-11304: Thanks for reporting the issue, [~kpalanisamy]. If a NameNode crashes because edit log has a gap, it can't be just a minor issue. It has to be at least a major one. Bump up the priority. > Namenode fails to start, even edit log available in the journal node > > > Key: HDFS-11304 > URL: https://issues.apache.org/jira/browse/HDFS-11304 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, journal-node >Affects Versions: 2.8.0, 2.7.1 > Environment: *HDP 2.4.2.0-258* >Reporter: Karthik P >Assignee: Karthik P > Labels: patch > > JN => JournalNode > NN => Namenode local directory (_dfs.namenode.name.dir_) > Y/N => Is edit file/log present? > Ex : edits_1627921-1627961 > *Scenario:* > ||JN 1||JN 2||JN 3||NN local|| Is NN started? > |N|N|Y|N|Started| > |Y|N|N|N|Started| > |N|Y|N|N|Failed| > |N|Y|N|Y|Started| > |Y|Y|N|N|Started| > *Note:* Namenode and JN2 installed on the same machine > *Trace :* > ERROR namenode.NameNode (NameNode.java:main(1712)) - Failed to start > namenode. > java.io.IOException: There appears to be a gap in the edit log. We expected > txid 1627921, but got txid 1627962. > at > org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:837) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:692) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:983) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:688) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:662) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:726) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:951) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:935) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1641) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1707) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11209) SNN can't checkpoint when rolling upgrade is not finalized
[ https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-11209: -- Attachment: HDFS-11209.01.patch Fix the build issue. > SNN can't checkpoint when rolling upgrade is not finalized > -- > > Key: HDFS-11209 > URL: https://issues.apache.org/jira/browse/HDFS-11209 > Project: Hadoop HDFS > Issue Type: Bug > Components: rolling upgrades >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Critical > Attachments: HDFS-11209.00.patch, HDFS-11209.01.patch > > > Similar problem has been fixed with HDFS-7185. Recent change in HDFS-8432 > brings this back. > With HDFS-8432, the primary NN will not update the VERSION file to the new > version after running with "rollingUpgrade" option until upgrade is > finalized. This is to support more downgrade use cases. > However, the checkpoint on the SNN is incorrectly updating the VERSION file > when the rollingUpgrade is not finalized yet. As a result, the SNN checkpoint > successfully but fail to push it to the primary NN because its version is > higher than the primary NN as shown below. > {code} > 2016-12-02 05:25:31,918 ERROR namenode.SecondaryNameNode > (SecondaryNameNode.java:doWork(399)) - Exception in doCheckpoint > org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException: > Image uploading failed, status: 403, url: > http://NN:50070/imagetransfer?txid=345404754&imageFile=IMAGE&File-Le..., > message: This namenode has storage info -60:221856466:1444080250181:clusterX > but the secondary expected -63:221856466:1444080250181:clusterX > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11209) SNN can't checkpoint when rolling upgrade is not finalized
[ https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812624#comment-15812624 ] Hadoop QA commented on HDFS-11209: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 27s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 27s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 0m 27s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 27s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 149 unchanged - 0 fixed = 151 total (was 149) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 27s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 15s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 29s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 22m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-11209 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846388/HDFS-11209.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 4eb08e4769c8 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 91bf504 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | mvninstall | https://builds.apache.org/job/PreCommit-HDFS-Build/18110/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt | | compile | https://builds.apache.org/job/PreCommit-HDFS-Build/18110/artifact/patchprocess/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt | | cc | https://builds.apache.org/job/PreCommit-HDFS-Build/18110/artifact/patchprocess/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/18110/artifact/patchprocess/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt | | checkstyle | https://builds.apache.org/
[jira] [Commented] (HDFS-7967) Reduce the performance impact of the balancer
[ https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812618#comment-15812618 ] Hadoop QA commented on HDFS-7967: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-7967 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-7967 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846394/HDFS-7967.branch-2-1.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18111/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Reduce the performance impact of the balancer > - > > Key: HDFS-7967 > URL: https://issues.apache.org/jira/browse/HDFS-7967 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-7967-branch-2.8.patch, HDFS-7967-branch-2.patch, > HDFS-7967.branch-2-1.patch, HDFS-7967.branch-2.8-1.patch > > > The balancer needs to query for blocks to move from overly full DNs. The > block lookup is extremely inefficient. An iterator of the node's blocks is > created from the iterators of its storages' blocks. A random number is > chosen corresponding to how many blocks will be skipped via the iterator. > Each skip requires costly scanning of triplets. > The current design also only considers node imbalances while ignoring > imbalances within the nodes's storages. A more efficient and intelligent > design may eliminate the costly skipping of blocks via round-robin selection > of blocks from the storages based on remaining capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11301) Double wrapping over RandomAccessFile in LocalReplicaInPipeline#createStreams
[ https://issues.apache.org/jira/browse/HDFS-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812602#comment-15812602 ] Hanisha Koneru commented on HDFS-11301: --- Thank you [~arpitagarwal]. > Double wrapping over RandomAccessFile in LocalReplicaInPipeline#createStreams > - > > Key: HDFS-11301 > URL: https://issues.apache.org/jira/browse/HDFS-11301 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Minor > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11301.000.patch > > > In LocalReplicaInPipeline#createStreams, there is a WrappedFileOutputStream > created over a WrappedRandomAccessFile. This double layer of instrumentation > is unnecessary. > {quote} > blockOut = fileIoProvider.getFileOutputStream(getVolume(), > fileIoProvider.getRandomAccessFile(getVolume(), blockFile, "rw") > .getFD()); > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812601#comment-15812601 ] Chris Nauroth commented on HDFS-11163: -- [~surendrasingh], thank you for the patch. This looks correct to me. One thing I'm unsure about is the potential impact on performance of Mover. It will require an additional {{getStoragePolicy}} RPC per file with the default storage policy, whereas previously there was no RPC for those files. Unfortunately, I don't see a way to avoid that, at least not with the current APIs, because that's how we resolve inheritance of storage policies from parent paths. I would prefer to get an opinion from [~szetszwo]. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7967) Reduce the performance impact of the balancer
[ https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-7967: -- Attachment: HDFS-7967.branch-2-1.patch HDFS-7967.branch-2.8-1.patch Simple change from jdk predicate to google predicate. > Reduce the performance impact of the balancer > - > > Key: HDFS-7967 > URL: https://issues.apache.org/jira/browse/HDFS-7967 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-7967-branch-2.8.patch, HDFS-7967-branch-2.patch, > HDFS-7967.branch-2-1.patch, HDFS-7967.branch-2.8-1.patch > > > The balancer needs to query for blocks to move from overly full DNs. The > block lookup is extremely inefficient. An iterator of the node's blocks is > created from the iterators of its storages' blocks. A random number is > chosen corresponding to how many blocks will be skipped via the iterator. > Each skip requires costly scanning of triplets. > The current design also only considers node imbalances while ignoring > imbalances within the nodes's storages. A more efficient and intelligent > design may eliminate the costly skipping of blocks via round-robin selection > of blocks from the storages based on remaining capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11303) Hedged read might hang infinitely if read data from all DN failed
[ https://issues.apache.org/jira/browse/HDFS-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812572#comment-15812572 ] stack commented on HDFS-11303: -- Patch LGTM. Your patch allows that the primary read might still complete before the new hedged reads whereas what was there previous would discard anything that came in after timeout. Good. The test is just to verify we time out? W/o your fix, the test hangs? > Hedged read might hang infinitely if read data from all DN failed > -- > > Key: HDFS-11303 > URL: https://issues.apache.org/jira/browse/HDFS-11303 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0-alpha1 >Reporter: Chen Zhang > Attachments: HDFS-11303-001.patch > > > Hedged read will read from a DN first, if timeout, then read other DNs > simultaneously. > If read all DN failed, this bug will cause the future-list not empty(the > first timeout request left in list), and hang in the loop infinitely -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11209) SNN can't checkpoint when rolling upgrade is not finalized
[ https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-11209: -- Attachment: HDFS-11209.00.patch Attach a patch to check the NN rolling upgrade status before update the VERSION file on SNN and Backup NN. The original code that checks the SNN namesystem rollingUpgrade won't work as SNN will never start with RollingUpgrade option. Backup NN should have the similar issue. Will add a unit test later. > SNN can't checkpoint when rolling upgrade is not finalized > -- > > Key: HDFS-11209 > URL: https://issues.apache.org/jira/browse/HDFS-11209 > Project: Hadoop HDFS > Issue Type: Bug > Components: rolling upgrades >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Critical > Attachments: HDFS-11209.00.patch > > > Similar problem has been fixed with HDFS-7185. Recent change in HDFS-8432 > brings this back. > With HDFS-8432, the primary NN will not update the VERSION file to the new > version after running with "rollingUpgrade" option until upgrade is > finalized. This is to support more downgrade use cases. > However, the checkpoint on the SNN is incorrectly updating the VERSION file > when the rollingUpgrade is not finalized yet. As a result, the SNN checkpoint > successfully but fail to push it to the primary NN because its version is > higher than the primary NN as shown below. > {code} > 2016-12-02 05:25:31,918 ERROR namenode.SecondaryNameNode > (SecondaryNameNode.java:doWork(399)) - Exception in doCheckpoint > org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException: > Image uploading failed, status: 403, url: > http://NN:50070/imagetransfer?txid=345404754&imageFile=IMAGE&File-Le..., > message: This namenode has storage info -60:221856466:1444080250181:clusterX > but the secondary expected -63:221856466:1444080250181:clusterX > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11209) SNN can't checkpoint when rolling upgrade is not finalized
[ https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-11209: -- Status: Patch Available (was: Open) > SNN can't checkpoint when rolling upgrade is not finalized > -- > > Key: HDFS-11209 > URL: https://issues.apache.org/jira/browse/HDFS-11209 > Project: Hadoop HDFS > Issue Type: Bug > Components: rolling upgrades >Affects Versions: 3.0.0-alpha1, 2.8.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Critical > Attachments: HDFS-11209.00.patch > > > Similar problem has been fixed with HDFS-7185. Recent change in HDFS-8432 > brings this back. > With HDFS-8432, the primary NN will not update the VERSION file to the new > version after running with "rollingUpgrade" option until upgrade is > finalized. This is to support more downgrade use cases. > However, the checkpoint on the SNN is incorrectly updating the VERSION file > when the rollingUpgrade is not finalized yet. As a result, the SNN checkpoint > successfully but fail to push it to the primary NN because its version is > higher than the primary NN as shown below. > {code} > 2016-12-02 05:25:31,918 ERROR namenode.SecondaryNameNode > (SecondaryNameNode.java:doWork(399)) - Exception in doCheckpoint > org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException: > Image uploading failed, status: 403, url: > http://NN:50070/imagetransfer?txid=345404754&imageFile=IMAGE&File-Le..., > message: This namenode has storage info -60:221856466:1444080250181:clusterX > but the secondary expected -63:221856466:1444080250181:clusterX > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10733) NameNode terminated after full GC thinking QJM is unresponsive.
[ https://issues.apache.org/jira/browse/HDFS-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812538#comment-15812538 ] Vinitha Reddy Gankidi commented on HDFS-10733: -- [~kihwal] Thanks for the great suggestion. I have attached a patch that increases the endtime/timeout if there is a long pause due to a Full GC in NN. The unit test included asserts that a timeout exception is thrown instead of increasing the timeout as in the case of a Full GC if there indeed aren't any responses from the journal nodes. Please take a look. > NameNode terminated after full GC thinking QJM is unresponsive. > --- > > Key: HDFS-10733 > URL: https://issues.apache.org/jira/browse/HDFS-10733 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, qjm >Affects Versions: 2.6.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10733.001.patch > > > NameNode went into full GC while in {{AsyncLoggerSet.waitForWriteQuorum()}}. > After completing GC it checks if the timeout for quorum is reached. If the GC > was long enough the timeout can expire, and {{QuorumCall.waitFor()}} will > throw {{TimeoutExcpetion}}. Finally {{FSEditLog.logSync()}} catches the > exception and terminates NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10733) NameNode terminated after full GC thinking QJM is unresponsive.
[ https://issues.apache.org/jira/browse/HDFS-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinitha Reddy Gankidi updated HDFS-10733: - Attachment: HDFS-10733.001.patch > NameNode terminated after full GC thinking QJM is unresponsive. > --- > > Key: HDFS-10733 > URL: https://issues.apache.org/jira/browse/HDFS-10733 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, qjm >Affects Versions: 2.6.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10733.001.patch > > > NameNode went into full GC while in {{AsyncLoggerSet.waitForWriteQuorum()}}. > After completing GC it checks if the timeout for quorum is reached. If the GC > was long enough the timeout can expire, and {{QuorumCall.waitFor()}} will > throw {{TimeoutExcpetion}}. Finally {{FSEditLog.logSync()}} catches the > exception and terminates NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11301) Double wrapping over RandomAccessFile in LocalReplicaInPipeline#createStreams
[ https://issues.apache.org/jira/browse/HDFS-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812524#comment-15812524 ] Hudson commented on HDFS-11301: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11091 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11091/]) HDFS-11301. Double wrapping over RandomAccessFile in (arp: rev 91bf504440967ccdff1cb1cbe7801a5ce2ba88ab) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/LocalReplicaInPipeline.java > Double wrapping over RandomAccessFile in LocalReplicaInPipeline#createStreams > - > > Key: HDFS-11301 > URL: https://issues.apache.org/jira/browse/HDFS-11301 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Minor > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11301.000.patch > > > In LocalReplicaInPipeline#createStreams, there is a WrappedFileOutputStream > created over a WrappedRandomAccessFile. This double layer of instrumentation > is unnecessary. > {quote} > blockOut = fileIoProvider.getFileOutputStream(getVolume(), > fileIoProvider.getRandomAccessFile(getVolume(), blockFile, "rw") > .getFD()); > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11299) Support multiple Datanode File IO hooks
[ https://issues.apache.org/jira/browse/HDFS-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812501#comment-15812501 ] Arpit Agarwal commented on HDFS-11299: -- Thank you for the updated patch [~hanishakoneru]. In addition to Xiaoyu's feedback, one typo in the setting name: dfs.datanode.enable.fileio.fault.injectio. _injectio --> injection_. The length field is a pre-existing bug but it's a good to fix it. > Support multiple Datanode File IO hooks > --- > > Key: HDFS-11299 > URL: https://issues.apache.org/jira/browse/HDFS-11299 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11299.000.patch, HDFS-11299.001.patch > > > HDFS-10958 introduces instrumentation hooks around DataNode disk IO and > HDFS-10959 adds support for profiling hooks to expose latency statistics. > Instead of choosing only one hook using Config parameters, we want to add two > separate hooks - one for profiling and one for fault injection. The fault > injection hook will be useful for testing purposes. > This jira only introduces support for fault injection hook. The > implementation for that will come later on. > Also, now Default and Counting FileIOEvents would not be needed as we can > control enabling the profiling and fault injection hooks using config > parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11301) Double wrapping over RandomAccessFile in LocalReplicaInPipeline#createStreams
[ https://issues.apache.org/jira/browse/HDFS-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-11301: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha2 Status: Resolved (was: Patch Available) Committed to trunk. The test failures look unrelated. Thanks for the contribution [~hanishakoneru]. > Double wrapping over RandomAccessFile in LocalReplicaInPipeline#createStreams > - > > Key: HDFS-11301 > URL: https://issues.apache.org/jira/browse/HDFS-11301 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Minor > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11301.000.patch > > > In LocalReplicaInPipeline#createStreams, there is a WrappedFileOutputStream > created over a WrappedRandomAccessFile. This double layer of instrumentation > is unnecessary. > {quote} > blockOut = fileIoProvider.getFileOutputStream(getVolume(), > fileIoProvider.getRandomAccessFile(getVolume(), blockFile, "rw") > .getFD()); > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11293) [SPS]: Local DN should be given preference as source node, when target available in same node
[ https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812385#comment-15812385 ] Rakesh R commented on HDFS-11293: - +1 on the latest patch. Pending Jenkins. > [SPS]: Local DN should be given preference as source node, when target > available in same node > - > > Key: HDFS-11293 > URL: https://issues.apache.org/jira/browse/HDFS-11293 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Yuanbo Liu >Assignee: Uma Maheswara Rao G >Priority: Critical > Attachments: HDFS-11293-HDFS-10285-00.patch, > HDFS-11293-HDFS-10285-01.patch, HDFS-11293-HDFS-10285-02.patch > > > In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica > info by block pool id. But in this situation: > {code} > datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}. > 1. the same block replica exists in A[DISK] and B[DISK]. > 2. the block pool id of datanode A and datanode B are the same. > {code} > Then we start to change the file's storage policy and move the block replica > in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at > this time, datanode A throws ReplicaAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org