[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-5776: Attachment: HDFS-5776-v2.txt v2 should address the javadoc and failed case > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v2.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
[ https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5794: Attachment: HDFS-5794.001.patch Thanks for the review, Nicholas and Colin! Rebase the patch and put CACHING last. > Fix the inconsistency of layout version number of > ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2 > --- > > Key: HDFS-5794 > URL: https://issues.apache.org/jira/browse/HDFS-5794 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Attachments: HDFS-5794.000.patch, HDFS-5794.001.patch > > > Currently in trunk, we have the layout version: > {code} > EDITLOG_ADD_BLOCK(-48, ...), > CACHING(-49, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...); > {code} > And in branch-2, we have: > {code} > EDITLOG_SUPPORT_RETRYCACHE(-47, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...); > {code} > We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus > EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change > ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written > by trunk and branch-2 have the same layout -50 but branch-2 cannot read the > -50 fsimage if it is written by trunk. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5748) Too much information shown in the dfs health page.
[ https://issues.apache.org/jira/browse/HDFS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5748: - Status: Patch Available (was: Open) > Too much information shown in the dfs health page. > -- > > Key: HDFS-5748 > URL: https://issues.apache.org/jira/browse/HDFS-5748 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Haohui Mai > Attachments: HDFS-5748.000.patch, hdfs-5748.png > > > I've noticed that the node lists are shown in the default name node web page. > This may be fine for small clusters, but for clusters with 1000s of nodes, > this is not ideal. The following should be shown on demand. (Some of them > have been there even before the recent rework.) > - Detailed data node information > - Startup progress > - Snapshot information -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5748) Too much information shown in the dfs health page.
[ https://issues.apache.org/jira/browse/HDFS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5748: - Issue Type: Improvement (was: Sub-task) Parent: (was: HDFS-5333) > Too much information shown in the dfs health page. > -- > > Key: HDFS-5748 > URL: https://issues.apache.org/jira/browse/HDFS-5748 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kihwal Lee >Assignee: Haohui Mai > Attachments: HDFS-5748.000.patch, hdfs-5748.png > > > I've noticed that the node lists are shown in the default name node web page. > This may be fine for small clusters, but for clusters with 1000s of nodes, > this is not ideal. The following should be shown on demand. (Some of them > have been there even before the recent rework.) > - Detailed data node information > - Startup progress > - Snapshot information -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5748) Too much information shown in the dfs health page.
[ https://issues.apache.org/jira/browse/HDFS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5748: - Attachment: HDFS-5748.000.patch > Too much information shown in the dfs health page. > -- > > Key: HDFS-5748 > URL: https://issues.apache.org/jira/browse/HDFS-5748 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Haohui Mai > Attachments: HDFS-5748.000.patch, hdfs-5748.png > > > I've noticed that the node lists are shown in the default name node web page. > This may be fine for small clusters, but for clusters with 1000s of nodes, > this is not ideal. The following should be shown on demand. (Some of them > have been there even before the recent rework.) > - Detailed data node information > - Startup progress > - Snapshot information -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5748) Too much information shown in the dfs health page.
[ https://issues.apache.org/jira/browse/HDFS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5748: - Attachment: hdfs-5748.png Convert information into tabs. > Too much information shown in the dfs health page. > -- > > Key: HDFS-5748 > URL: https://issues.apache.org/jira/browse/HDFS-5748 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Haohui Mai > Attachments: HDFS-5748.000.patch, hdfs-5748.png > > > I've noticed that the node lists are shown in the default name node web page. > This may be fine for small clusters, but for clusters with 1000s of nodes, > this is not ideal. The following should be shown on demand. (Some of them > have been there even before the recent rework.) > - Detailed data node information > - Startup progress > - Snapshot information -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874469#comment-13874469 ] Hadoop QA commented on HDFS-5138: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623568/HDFS-5138.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.server.namenode.TestBackupNode {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5907//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5907//console This message is automatically generated. > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5138: - Attachment: HDFS-5138.patch Here's an updated patch which implements Suresh's suggestion of doing away with the shared log lock and instead requiring that both NNs be shut down during the upgrade and the second NN re-bootstrapped after the first NN has performed the upgrade. I think this makes the procedure a little more complex for operators, but it certainly does simplify the code. [~sureshms] - I don't feel super strongly about it either way, so please let me know if you find this more palatable. I'd appreciate a prompt review as this JIRA has been going on for quite some time and I'd like to get it done. > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5153) Datanode should send block reports for each storage in a separate message
[ https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874362#comment-13874362 ] Hadoop QA commented on HDFS-5153: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623541/HDFS-5153.04.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5906//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5906//console This message is automatically generated. > Datanode should send block reports for each storage in a separate message > - > > Key: HDFS-5153 > URL: https://issues.apache.org/jira/browse/HDFS-5153 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal > Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, > HDFS-5153.03b.patch, HDFS-5153.04.patch > > > When the number of blocks on the DataNode grows large we start running into a > few issues: > # Block reports take a long time to process on the NameNode. In testing we > have seen that a block report with 6 Million blocks takes close to one second > to process on the NameNode. The NameSystem write lock is held during this > time. > # We start hitting the default protobuf message limit of 64MB somewhere > around 10 Million blocks. While we can increase the message size limit it > already takes over 7 seconds to serialize/unserialize a block report of this > size. > HDFS-2832 has introduced the concept of a DataNode as a collection of > storages i.e. the NameNode is aware of all the volumes (storage directories) > attached to a given DataNode. This makes it easy to split block reports from > the DN by sending one report per storage directory to mitigate the above > problems. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
[ https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5794: - Component/s: namenode Priority: Minor (was: Major) +1 patch looks good. > Fix the inconsistency of layout version number of > ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2 > --- > > Key: HDFS-5794 > URL: https://issues.apache.org/jira/browse/HDFS-5794 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Attachments: HDFS-5794.000.patch > > > Currently in trunk, we have the layout version: > {code} > EDITLOG_ADD_BLOCK(-48, ...), > CACHING(-49, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...); > {code} > And in branch-2, we have: > {code} > EDITLOG_SUPPORT_RETRYCACHE(-47, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...); > {code} > We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus > EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change > ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written > by trunk and branch-2 have the same layout -50 but branch-2 cannot read the > -50 fsimage if it is written by trunk. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section
[ https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874341#comment-13874341 ] Hudson commented on HDFS-5784: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5015 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5015/]) HDFS-5784. Reserve space in edit log header and fsimage header for feature flag section (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558974) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutFlags.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournalNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml > reserve space in edit log header and fsimage header for feature flag section > > > Key: HDFS-5784 > URL: https://issues.apache.org/jira/browse/HDFS-5784 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, > HDFS-5784.003.patch > > > We should reserve space in the edit log header and fsimage header so that we > can add layout feature flags later in a compatible manner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
[ https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874328#comment-13874328 ] Hadoop QA commented on HDFS-5794: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623524/HDFS-5794.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5905//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5905//console This message is automatically generated. > Fix the inconsistency of layout version number of > ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2 > --- > > Key: HDFS-5794 > URL: https://issues.apache.org/jira/browse/HDFS-5794 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5794.000.patch > > > Currently in trunk, we have the layout version: > {code} > EDITLOG_ADD_BLOCK(-48, ...), > CACHING(-49, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...); > {code} > And in branch-2, we have: > {code} > EDITLOG_SUPPORT_RETRYCACHE(-47, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...); > {code} > We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus > EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change > ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written > by trunk and branch-2 have the same layout -50 but branch-2 cannot read the > -50 fsimage if it is written by trunk. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section
[ https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5784: --- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > reserve space in edit log header and fsimage header for feature flag section > > > Key: HDFS-5784 > URL: https://issues.apache.org/jira/browse/HDFS-5784 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, > HDFS-5784.003.patch > > > We should reserve space in the edit log header and fsimage header so that we > can add layout feature flags later in a compatible manner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section
[ https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874307#comment-13874307 ] Colin Patrick McCabe commented on HDFS-5784: backport to branch-2 is pending the layout version fix in HDFS-5794 > reserve space in edit log header and fsimage header for feature flag section > > > Key: HDFS-5784 > URL: https://issues.apache.org/jira/browse/HDFS-5784 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, > HDFS-5784.003.patch > > > We should reserve space in the edit log header and fsimage header so that we > can add layout feature flags later in a compatible manner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5748) Too much information shown in the dfs health page.
[ https://issues.apache.org/jira/browse/HDFS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai reassigned HDFS-5748: Assignee: Haohui Mai > Too much information shown in the dfs health page. > -- > > Key: HDFS-5748 > URL: https://issues.apache.org/jira/browse/HDFS-5748 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Haohui Mai > > I've noticed that the node lists are shown in the default name node web page. > This may be fine for small clusters, but for clusters with 1000s of nodes, > this is not ideal. The following should be shown on demand. (Some of them > have been there even before the recent rework.) > - Detailed data node information > - Startup progress > - Snapshot information -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
[ https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874306#comment-13874306 ] Colin Patrick McCabe commented on HDFS-5794: +1 for this approach of putting CACHING last, once you've rebased and re-run jenkins. HDFS-5784 added another layout version which CACHING should come after. > Fix the inconsistency of layout version number of > ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2 > --- > > Key: HDFS-5794 > URL: https://issues.apache.org/jira/browse/HDFS-5794 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5794.000.patch > > > Currently in trunk, we have the layout version: > {code} > EDITLOG_ADD_BLOCK(-48, ...), > CACHING(-49, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...); > {code} > And in branch-2, we have: > {code} > EDITLOG_SUPPORT_RETRYCACHE(-47, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...); > {code} > We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus > EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change > ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written > by trunk and branch-2 have the same layout -50 but branch-2 cannot read the > -50 fsimage if it is written by trunk. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5795) RemoteBlockReader2#checkSuccess() shoud print error status
Brandon Li created HDFS-5795: Summary: RemoteBlockReader2#checkSuccess() shoud print error status Key: HDFS-5795 URL: https://issues.apache.org/jira/browse/HDFS-5795 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Brandon Li Priority: Trivial RemoteBlockReader2#checkSuccess() doesn't print error status, which makes debug harder when the client can't read from DataNode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874247#comment-13874247 ] Jing Zhao commented on HDFS-5709: - The patch looks good to me. Some comments: # In FSEditLogLoader, we also need to handle "OP_ADD_BLOCK" after HDFS-5704 got committed recently. # It will be better to let renameReservedPathsOnUpgrade and renameReservedComponentOnUpgrade take a string as the third parameter instead of an instance of FSNamesystem. We may also want to pass in the to-be-replaced string as a parameter, and in that case, we can make these two methods act as more generic utility methods so that it can be used for other reserved names (e.g., "/.reserved/.inodes", which we may want to handle in a separate jira). # Then we can have another two methods in FSImageFormat/FSEditLogLoader to call the util methods, where we can check layoutversion and new names. # When checking the new name, I think we should follow the rules in DFSUtil#isValidName? # In renameReservedComponentOnUpgrade, maybe we do not need to convert the byte[] to string for the comparison? We have DOT_SNAPSHOT_DIR_BYTES defined in HdfsConstants already. # There is an unused import in FSImageFormat. > Improve upgrade with existing files and directories named ".snapshot" > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery
[ https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874219#comment-13874219 ] Todd Lipcon commented on HDFS-5790: --- As a quick check of the above, I did the following patch: {code} diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java b/hadoop-hdfs-p index 8b5fb81..62e60da 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java @@ -40,6 +40,7 @@ import org.apache.hadoop.util.Daemon; import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Objects; import com.google.common.base.Preconditions; /** @@ -256,6 +257,16 @@ public boolean expiredSoftLimit() { * @return the path associated with the pendingFile and null if not found. */ private String findPath(INodeFile pendingFile) { + String retOrig = findPathOrig(pendingFile); + String retNew = pendingFile.getFullPathName(); + if (!Objects.equal(retOrig, retNew)) { +throw new AssertionError("orig implementation found: " + retOrig + + " new implementation found: " + retNew); + } + return retNew; +} + +private String findPathOrig(INodeFile pendingFile) { try { for (String src : paths) { INode node = fsnamesystem.dir.getINode(src); {code} that is to say, I try the suggested optimization, along with the original implementation, and verify that they return the same results. I ran all the HDFS tests and they all passed, indicating that it's likely this optimization wouldn't break anything. And, it should be much faster, since it's O(directory depth) instead of O(number of leases held by the client * those lease's directory depths). Anyone have opinions on this? [~kihwal] or [~daryn] maybe? (seem to recall both of you working in this area a few months back) > LeaseManager.findPath is very slow when many leases need recovery > - > > Key: HDFS-5790 > URL: https://issues.apache.org/jira/browse/HDFS-5790 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, performance >Affects Versions: 2.4.0 >Reporter: Todd Lipcon > > We recently saw an issue where the NN restarted while tens of thousands of > files were open. The NN then ended up spending multiple seconds for each > commitBlockSynchronization() call, spending most of its time inside > LeaseManager.findPath(). findPath currently works by looping over all files > held for a given writer, and traversing the filesystem for each one. This > takes way too long when tens of thousands of files are open by a single > writer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874217#comment-13874217 ] Jing Zhao commented on HDFS-5709: - Thanks for the work Andrew! I'm now reviewing the patch and will post my comments soon. > Improve upgrade with existing files and directories named ".snapshot" > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5434) Write resiliency for replica count 1
[ https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874212#comment-13874212 ] Arpit Agarwal commented on HDFS-5434: - I think you are right, good catch. > Write resiliency for replica count 1 > > > Key: HDFS-5434 > URL: https://issues.apache.org/jira/browse/HDFS-5434 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Buddy >Priority: Minor > Attachments: BlockPlacementPolicyMinPipelineSize.java, > BlockPlacementPolicyMinPipelineSizeWithNodeGroup.java, > HDFS-5434-branch-2.patch, HDFS_5434.patch > > > If a file has a replica count of one, the HDFS client is exposed to write > failures if the data node fails during a write. With a pipeline of size of > one, no recovery is possible if the sole data node dies. > A simple fix is to force a minimum pipeline size of 2, while leaving the > replication count as 1. The implementation for this is fairly non-invasive. > Although the replica count is one, the block will be written to two data > nodes instead of one. If one of the data nodes fails during the write, normal > pipeline recovery will ensure that the write succeeds to the surviving data > node. > The existing code in the name node will prune the extra replica when it > receives the block received reports for the finalized block from both data > nodes. This results in the intended replica count of one for the block. > This behavior should be controlled by a configuration option such as > {{dfs.namenode.minPipelineSize}}. > This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by > ensuring that the pipeline size passed to > {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: > {code} > max(replication, ${dfs.namenode.minPipelineSize}) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874210#comment-13874210 ] Hadoop QA commented on HDFS-5709: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623493/hdfs-5709-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDFSUpgradeFromImage {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5904//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5904//console This message is automatically generated. > Improve upgrade with existing files and directories named ".snapshot" > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5153) Datanode should send block reports for each storage in a separate message
[ https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5153: Attachment: HDFS-5153.04.patch > Datanode should send block reports for each storage in a separate message > - > > Key: HDFS-5153 > URL: https://issues.apache.org/jira/browse/HDFS-5153 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal > Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, > HDFS-5153.03b.patch, HDFS-5153.04.patch > > > When the number of blocks on the DataNode grows large we start running into a > few issues: > # Block reports take a long time to process on the NameNode. In testing we > have seen that a block report with 6 Million blocks takes close to one second > to process on the NameNode. The NameSystem write lock is held during this > time. > # We start hitting the default protobuf message limit of 64MB somewhere > around 10 Million blocks. While we can increase the message size limit it > already takes over 7 seconds to serialize/unserialize a block report of this > size. > HDFS-2832 has introduced the concept of a DataNode as a collection of > storages i.e. the NameNode is aware of all the volumes (storage directories) > attached to a given DataNode. This makes it easy to split block reports from > the DN by sending one report per storage directory to mitigate the above > problems. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5153) Datanode should send block reports for each storage in a separate message
[ https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5153: Attachment: (was: HDFS-5153.04.patch) > Datanode should send block reports for each storage in a separate message > - > > Key: HDFS-5153 > URL: https://issues.apache.org/jira/browse/HDFS-5153 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal > Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, > HDFS-5153.03b.patch > > > When the number of blocks on the DataNode grows large we start running into a > few issues: > # Block reports take a long time to process on the NameNode. In testing we > have seen that a block report with 6 Million blocks takes close to one second > to process on the NameNode. The NameSystem write lock is held during this > time. > # We start hitting the default protobuf message limit of 64MB somewhere > around 10 Million blocks. While we can increase the message size limit it > already takes over 7 seconds to serialize/unserialize a block report of this > size. > HDFS-2832 has introduced the concept of a DataNode as a collection of > storages i.e. the NameNode is aware of all the volumes (storage directories) > attached to a given DataNode. This makes it easy to split block reports from > the DN by sending one report per storage directory to mitigate the above > problems. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5153) Datanode should send block reports for each storage in a separate message
[ https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5153: Attachment: HDFS-5153.04.patch Updated patch with the following changes: # Introduces new conf key {{DFS_BLOCKREPORT_SPLIT_THRESHOLD_KEY}}. If the number of blocks on the DN is below this threshold then block reports for all storages are sent in a single message. Else the block reports will be split across messages. The default value is currently 1Million blocks. # When splitting, reports for all storages are sent in quick succession. In the future we can consider spacing them apart. # The 'staleness' of a DN is determined by whether all storages have reported since the last restart/failover. # Added new test class {{TestDnRespectsBlockReportSplitThreshold}}. # Refactored existing test {{TestBlockReport}} into base class {{BlockReportTestBase}} and derived classes for readability. > Datanode should send block reports for each storage in a separate message > - > > Key: HDFS-5153 > URL: https://issues.apache.org/jira/browse/HDFS-5153 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal > Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, > HDFS-5153.03b.patch, HDFS-5153.04.patch > > > When the number of blocks on the DataNode grows large we start running into a > few issues: > # Block reports take a long time to process on the NameNode. In testing we > have seen that a block report with 6 Million blocks takes close to one second > to process on the NameNode. The NameSystem write lock is held during this > time. > # We start hitting the default protobuf message limit of 64MB somewhere > around 10 Million blocks. While we can increase the message size limit it > already takes over 7 seconds to serialize/unserialize a block report of this > size. > HDFS-2832 has introduced the concept of a DataNode as a collection of > storages i.e. the NameNode is aware of all the volumes (storage directories) > attached to a given DataNode. This makes it easy to split block reports from > the DN by sending one report per storage directory to mitigate the above > problems. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section
[ https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874190#comment-13874190 ] Andrew Wang commented on HDFS-5784: --- LGTM, +1 > reserve space in edit log header and fsimage header for feature flag section > > > Key: HDFS-5784 > URL: https://issues.apache.org/jira/browse/HDFS-5784 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, > HDFS-5784.003.patch > > > We should reserve space in the edit log header and fsimage header so that we > can add layout feature flags later in a compatible manner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section
[ https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874184#comment-13874184 ] Colin Patrick McCabe commented on HDFS-5784: The failure in TestOfflineEditsViewer is because jenkins can't update the binary editsStored file (it can't yet do binary diffs), and is expected. > reserve space in edit log header and fsimage header for feature flag section > > > Key: HDFS-5784 > URL: https://issues.apache.org/jira/browse/HDFS-5784 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, > HDFS-5784.003.patch > > > We should reserve space in the edit log header and fsimage header so that we > can add layout feature flags later in a compatible manner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874185#comment-13874185 ] Chris Nauroth commented on HDFS-5758: - bq. I'm referring to distinguishing the ACLs that are generated and the ones are specified by setfacl / chmod. There are exactly 3 ACL entries that are inferred from permission bits: the owner, group and other entries. These are never really "generated", because they always originated from someone's setfacl or chmod call. If you feel strongly about this, let me know, and I'll change the patch to filter those 3 entries from the output of {{getAclStatus}} and then add them back in the getfacl CLI. I don't think the distinction provides an end user with any valuable information though. bq. I see it as an optimization. Can you keep it in a separate patch? No, reducing to a minimal ACL is a matter of correctness rather than optimization, so I don't think it can be separated to a different patch. Those 4 scenarios all eliminate the extended ACL, and correctness requires that we turn off the ACL bit. (See example below.) I suppose dropping the {{AclFeature}} could be thought of as an optimization, but it's going to be a tiny patch if I separate just that part. bq. Based on your description, it seems to me that in removeAcl, the task can be done via looking up the group entries and set the permission back. Yes, we can do that. I'll put together a new patch for that. {code} [cnauroth@ubuntu:pts/0] acltest > touch file1 [cnauroth@ubuntu:pts/0] acltest > setfacl -m user:bruce:rwx file1 [cnauroth@ubuntu:pts/0] acltest > ls -lrt file1 -rw-rwxr--+ 1 cnauroth 0 Jan 16 15:58 file1* [cnauroth@ubuntu:pts/0] acltest > getfacl file1 # file: file1 # owner: cnauroth # group: cnauroth user::rw- user:bruce:rwx group::rw- mask::rwx other::r-- [cnauroth@ubuntu:pts/0] acltest > setfacl -x user:bruce,mask file1 [cnauroth@ubuntu:pts/0] acltest > ls -lrt file1 -rw-rw-r-- 1 cnauroth 0 Jan 16 15:58 file1 [cnauroth@ubuntu:pts/0] acltest > getfacl file1 # file: file1 # owner: cnauroth # group: cnauroth user::rw- group::rw- other::r-- {code} > NameNode: complete implementation of inode modifications for ACLs. > -- > > Key: HDFS-5758 > URL: https://issues.apache.org/jira/browse/HDFS-5758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch > > > This patch will complete the remaining logic for the ACL get and set APIs, > including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in > the inodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section
[ https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874162#comment-13874162 ] Hadoop QA commented on HDFS-5784: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623472/HDFS-5784.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5903//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5903//console This message is automatically generated. > reserve space in edit log header and fsimage header for feature flag section > > > Key: HDFS-5784 > URL: https://issues.apache.org/jira/browse/HDFS-5784 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, > HDFS-5784.003.patch > > > We should reserve space in the edit log header and fsimage header so that we > can add layout feature flags later in a compatible manner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
[ https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5794: Status: Patch Available (was: Open) > Fix the inconsistency of layout version number of > ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2 > --- > > Key: HDFS-5794 > URL: https://issues.apache.org/jira/browse/HDFS-5794 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5794.000.patch > > > Currently in trunk, we have the layout version: > {code} > EDITLOG_ADD_BLOCK(-48, ...), > CACHING(-49, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...); > {code} > And in branch-2, we have: > {code} > EDITLOG_SUPPORT_RETRYCACHE(-47, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...); > {code} > We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus > EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change > ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written > by trunk and branch-2 have the same layout -50 but branch-2 cannot read the > -50 fsimage if it is written by trunk. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
[ https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5794: Attachment: HDFS-5794.000.patch One solution for this issue is to switch CACHING and ADD_DATANODE_AND_STORAGE_UUIDS in trunk. Post a simple patch following this solution. > Fix the inconsistency of layout version number of > ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2 > --- > > Key: HDFS-5794 > URL: https://issues.apache.org/jira/browse/HDFS-5794 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5794.000.patch > > > Currently in trunk, we have the layout version: > {code} > EDITLOG_ADD_BLOCK(-48, ...), > CACHING(-49, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...); > {code} > And in branch-2, we have: > {code} > EDITLOG_SUPPORT_RETRYCACHE(-47, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...); > {code} > We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus > EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change > ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written > by trunk and branch-2 have the same layout -50 but branch-2 cannot read the > -50 fsimage if it is written by trunk. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874125#comment-13874125 ] Andrew Wang commented on HDFS-5709: --- Woops, I should have mentioned that one. The config only needs to be validated if it needs to be used (i.e. we encounter a .snapshot path), so I felt it was better to keep all the validation inside that method. Thanks again ATM. > Improve upgrade with existing files and directories named ".snapshot" > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
Jing Zhao created HDFS-5794: --- Summary: Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2 Key: HDFS-5794 URL: https://issues.apache.org/jira/browse/HDFS-5794 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently in trunk, we have the layout version: {code} EDITLOG_ADD_BLOCK(-48, ...), CACHING(-49, ...), ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...); {code} And in branch-2, we have: {code} EDITLOG_SUPPORT_RETRYCACHE(-47, ...), ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...); {code} We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written by trunk and branch-2 have the same layout -50 but branch-2 cannot read the -50 fsimage if it is written by trunk. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
[ https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5794: Description: Currently in trunk, we have the layout version: {code} EDITLOG_ADD_BLOCK(-48, ...), CACHING(-49, ...), ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...); {code} And in branch-2, we have: {code} EDITLOG_SUPPORT_RETRYCACHE(-47, ...), ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...); {code} We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written by trunk and branch-2 have the same layout -50 but branch-2 cannot read the -50 fsimage if it is written by trunk. was: Currently in trunk, we have the layout version: {code} EDITLOG_ADD_BLOCK(-48, ...), CACHING(-49, ...), ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...); {code} And in branch-2, we have: {code} EDITLOG_SUPPORT_RETRYCACHE(-47, ...), ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...); {code} We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written by trunk and branch-2 have the same layout -50 but branch-2 cannot read the -50 fsimage if it is written by trunk. > Fix the inconsistency of layout version number of > ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2 > --- > > Key: HDFS-5794 > URL: https://issues.apache.org/jira/browse/HDFS-5794 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently in trunk, we have the layout version: > {code} > EDITLOG_ADD_BLOCK(-48, ...), > CACHING(-49, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...); > {code} > And in branch-2, we have: > {code} > EDITLOG_SUPPORT_RETRYCACHE(-47, ...), > ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...); > {code} > We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus > EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change > ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written > by trunk and branch-2 have the same layout -50 but branch-2 cannot read the > -50 fsimage if it is written by trunk. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5791) TestHttpsFileSystem should use a random port to avoid binding error during testing
[ https://issues.apache.org/jira/browse/HDFS-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5791: - Affects Version/s: 3.0.0 > TestHttpsFileSystem should use a random port to avoid binding error during > testing > -- > > Key: HDFS-5791 > URL: https://issues.apache.org/jira/browse/HDFS-5791 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Brandon Li > > {noformat} > org.apache.hadoop.hdfs.web.TestHttpsFileSystem.org.apache.hadoop.hdfs.web.TestHttpsFileSystem > Failing for the past 1 build (Since Failed#5900 ) > Took 2.7 sec. > Error Message > Port in use: localhost:50475 > Stacktrace > java.net.BindException: Port in use: localhost:50475 > at java.net.PlainSocketImpl.socketBind(Native Method) > at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:383) > at java.net.ServerSocket.bind(ServerSocket.java:328) > at java.net.ServerSocket.(ServerSocket.java:194) > at javax.net.ssl.SSLServerSocket.(SSLServerSocket.java:106) > at > com.sun.net.ssl.internal.ssl.SSLServerSocketImpl.(SSLServerSocketImpl.java:108) > at > com.sun.net.ssl.internal.ssl.SSLServerSocketFactoryImpl.createServerSocket(SSLServerSocketFactoryImpl.java:72) > at > org.mortbay.jetty.security.SslSocketConnector.newServerSocket(SslSocketConnector.java:478) > at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73) > at org.apache.hadoop.http.HttpServer.openListeners(HttpServer.java:973) > at org.apache.hadoop.http.HttpServer.start(HttpServer.java:914) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:413) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:770) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:316) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1847) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1747) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1217) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:683) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:351) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:332) > at > org.apache.hadoop.hdfs.web.TestHttpsFileSystem.setUp(TestHttpsFileSystem.java:64) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874113#comment-13874113 ] Haohui Mai commented on HDFS-5758: -- bq. This sounds like a scenario where one person is responsible for chmod and another person is responsible for setfacl, and you want to be able to tell who did which part of it. Is that right? I'm referring to distinguishing the ACLs that are generated and the ones are specified by setfacl / chmod. bq. There are 4 distinct scenarios that can cause reduction of an extended ACL to a minimal ACL: I see it as an optimization. Can you keep it in a separate patch? Based on your description, it seems to me that in {{removeAcl}}, the task can be done via looking up the group entries and set the permission back. Note that the look up can be done efficiently since the list is sorted. You might be able to get rid of the filter function. > NameNode: complete implementation of inode modifications for ACLs. > -- > > Key: HDFS-5758 > URL: https://issues.apache.org/jira/browse/HDFS-5758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch > > > This patch will complete the remaining logic for the ACL get and set APIs, > including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in > the inodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5788) listLocatedStatus response can be very large
[ https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874107#comment-13874107 ] Nathan Roberts commented on HDFS-5788: -- A simple solution is: Restrict the size to dfs.ls.limit (default 1000) files OR dfs.ls.limit block locations, whichever comes first (obviously always returning only whole entries, so we could send more than this number of locations) Yes, it will require more RPCs. However, it would seem to lower the risk of a DoS. > listLocatedStatus response can be very large > > > Key: HDFS-5788 > URL: https://issues.apache.org/jira/browse/HDFS-5788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > > Currently we limit the size of listStatus requests to a default of 1000 > entries. This works fine except in the case of listLocatedStatus where the > location information can be quite large. As an example, a directory with 7000 > entries, 4 blocks each, 3 way replication - a listLocatedStatus response is > over 1MB. This can chew up very large amounts of memory in the NN if lots of > clients try to do this simultaneously. > Seems like it would be better if we also considered the amount of location > information being returned when deciding how many files to return. > Patch will follow shortly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874078#comment-13874078 ] Chris Nauroth commented on HDFS-5758: - bq. It's difficult for the client to differentiate the ACLs you've generated and the ones that the admins have specified. This sounds like a scenario where one person is responsible for chmod and another person is responsible for setfacl, and you want to be able to tell who did which part of it. Is that right? In general, it's going to be impossible to reliably tell the difference no matter which implementation choice we make. This is because setfacl is allowed to change permission bits. Calling setfacl \-\-set user::rwx,group::r\-\-,other::r\-\- on a file that has no extended ACL changes just the permission bits. Likewise, chmod is allowed to change extended ACL entries. It can change one specific extended ACL entry, the mask entry, if the file has an extended ACL. bq. We have to implement optimizations to get rid of minimal ACLs. Does this refer to the code paths where we check for ACL size 3? I don't see a way to eliminate this completely, but at least that logic is encapsulated behind {{AclStorage}}. There are 4 distinct scenarios that can cause reduction of an extended ACL to a minimal ACL: # {{FileSystem#removeAclEntries}} passes an ACL spec that selectively removes all of the extended ACL entries, both access and default. # {{FileSystem#removeDefaultAcl}} removes all default ACL entries, and there are no extended access ACL entries. # {{FileSystem#removeAcl}} removes all extended ACL entries. # {{FileSystem#setAcl}} on an inode that has an extended ACL and the caller passes an ACL spec with exactly 3 entries corresponding to a minimal ACL, overwriting the extended ACL with a minimal ACL. bq. It seems to me that in my proposal removeAclEntries cannot simply drops the ACL features in the inode. Am I missing something? In addition to removing the {{AclFeature}} and toggling the ACL bit in the {{FsPermission}}, we must also get the group permissions out of the old ACL and put it back into the {{FsPermission}}. (Recall that for an inode with an extended ACL, the {{FsPermission}} group bits are actually the mask, and this implementation choice simplifies integration with a lot of legacy APIs.) See {{TestNameNodeAcl#testRemoveAclEntriesMinimal}} for a unit test that shows an example of this. Initially, we set an extended ACL that includes user:foo:rwx. Because of this, the inferred mask is mask::rwx, and this gets stored into the group bits of {{FsPermission}}. After calling {{removeAclEntries}}, we reduce to a minimal ACL, and the group permissions get restored to group::rw-. If we simply dropped the {{AclFeature}}, then we'd still be left with group::rwx in the {{FsPermission}}, which would be incorrect. > NameNode: complete implementation of inode modifications for ACLs. > -- > > Key: HDFS-5758 > URL: https://issues.apache.org/jira/browse/HDFS-5758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch > > > This patch will complete the remaining logic for the ACL get and set APIs, > including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in > the inodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5788) listLocatedStatus response can be very large
[ https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874082#comment-13874082 ] Suresh Srinivas commented on HDFS-5788: --- bq. Then due to lack of flow control in the RPC layer we can fill up the heap with these given a large enough average response buffer per call and enough clients. [~jlowe], thanks for the pointer. We can certainly reduce the number of files returned in each iteration. But it would increase the number of requests to be processed by NameNode though. > listLocatedStatus response can be very large > > > Key: HDFS-5788 > URL: https://issues.apache.org/jira/browse/HDFS-5788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > > Currently we limit the size of listStatus requests to a default of 1000 > entries. This works fine except in the case of listLocatedStatus where the > location information can be quite large. As an example, a directory with 7000 > entries, 4 blocks each, 3 way replication - a listLocatedStatus response is > over 1MB. This can chew up very large amounts of memory in the NN if lots of > clients try to do this simultaneously. > Seems like it would be better if we also considered the amount of location > information being returned when deciding how many files to return. > Patch will follow shortly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5743) Use protobuf to serialize snapshot information
[ https://issues.apache.org/jira/browse/HDFS-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5743: Attachment: HDFS-5743.000.patch Initial patch. Still need to clean the code and fix unit tests. > Use protobuf to serialize snapshot information > -- > > Key: HDFS-5743 > URL: https://issues.apache.org/jira/browse/HDFS-5743 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Jing Zhao > Attachments: HDFS-5743.000.patch > > > This jira tracks the efforts of using protobuf to serialize snapshot-related > information in FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5793) Optimize the serialization of PermissionStatus
[ https://issues.apache.org/jira/browse/HDFS-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874061#comment-13874061 ] Haohui Mai commented on HDFS-5793: -- I've tested the impact of this patch using a 512M fsimage on my machine. This patch reduces the loading time from 14s to 12s. It also reduces the size of the fsimage by 7%, from 508M to 469M. > Optimize the serialization of PermissionStatus > -- > > Key: HDFS-5793 > URL: https://issues.apache.org/jira/browse/HDFS-5793 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5793.000.patch > > > {{PermissionStatus}} contains the user name, the group name and the > permission. It is serialized into two strings and a short. > Note that the size of unique user / groups names are relatively small, thus > this format has some performance penalties. The names are stored multiple > times, increasing both the storage size and the overhead of reconstructing > the names. > This jira proposes to serialize {{PermissionStatus}} similar to its in-memory > layout. The code can record a mapping between user / group names and ids, and > pack user / group / permission into a single 64-bit long. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5793) Optimize the serialization of PermissionStatus
Haohui Mai created HDFS-5793: Summary: Optimize the serialization of PermissionStatus Key: HDFS-5793 URL: https://issues.apache.org/jira/browse/HDFS-5793 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5793.000.patch {{PermissionStatus}} contains the user name, the group name and the permission. It is serialized into two strings and a short. Note that the size of unique user / groups names are relatively small, thus this format has some performance penalties. The names are stored multiple times, increasing both the storage size and the overhead of reconstructing the names. This jira proposes to serialize {{PermissionStatus}} similar to its in-memory layout. The code can record a mapping between user / group names and ids, and pack user / group / permission into a single 64-bit long. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5793) Optimize the serialization of PermissionStatus
[ https://issues.apache.org/jira/browse/HDFS-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5793: - Attachment: HDFS-5793.000.patch > Optimize the serialization of PermissionStatus > -- > > Key: HDFS-5793 > URL: https://issues.apache.org/jira/browse/HDFS-5793 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5793.000.patch > > > {{PermissionStatus}} contains the user name, the group name and the > permission. It is serialized into two strings and a short. > Note that the size of unique user / groups names are relatively small, thus > this format has some performance penalties. The names are stored multiple > times, increasing both the storage size and the overhead of reconstructing > the names. > This jira proposes to serialize {{PermissionStatus}} similar to its in-memory > layout. The code can record a mapping between user / group names and ids, and > pack user / group / permission into a single 64-bit long. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874046#comment-13874046 ] Aaron T. Myers commented on HDFS-5709: -- The latest patch looks pretty good to me. Docs look great, thanks for adding those. Looks like you also didn't take my suggestion to move the path separator check to NN startup, but I'm guessing you had your reasons. +1 once you explain that, and pending a clean Jenkins run. > Improve upgrade with existing files and directories named ".snapshot" > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874037#comment-13874037 ] Haohui Mai commented on HDFS-5758: -- bq. Is there a benefit to that change (other than mimicking libacl)? I suppose we could change getAclStatus like that, but it does mean that all potential clients (not just getfacl) must do both a getFileStatus and a getAclStatus to get a complete picture of permissions. # It's difficult for the client to differentiate the ACLs you've generated and the ones that the admins have specified. The client can easily calculate the logical ACL by itself if the permissions and extended ACLs are separated. # We have to implement optimizations to get rid of minimal ACLs. It seems to me that in my proposal {{removeAclEntries}} cannot simply drops the ACL features in the inode. Am I missing something? > NameNode: complete implementation of inode modifications for ACLs. > -- > > Key: HDFS-5758 > URL: https://issues.apache.org/jira/browse/HDFS-5758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch > > > This patch will complete the remaining logic for the ACL get and set APIs, > including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in > the inodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5434) Write resiliency for replica count 1
[ https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874026#comment-13874026 ] Eric Sirianni commented on HDFS-5434: - Arpit - after some discussion, Buddy and I realized that the approach in HDFS-5769 can't be applied to solve the pipeline recovery problem. In a shared-storage environment, the R/O node only has access to the {{hsync()}}-ed data (the data actually flushed to the shared storage device). However, pipeline recovery must guarantee that all data that has been {{ACK}}'d can be recovered. Therefore, recruiting a R/O node into a pipeline (as HDFS-5769 suggests) will not work for this case. Is our understanding correct? (If so I will also update HDFS-5769 accordingly) > Write resiliency for replica count 1 > > > Key: HDFS-5434 > URL: https://issues.apache.org/jira/browse/HDFS-5434 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Buddy >Priority: Minor > Attachments: BlockPlacementPolicyMinPipelineSize.java, > BlockPlacementPolicyMinPipelineSizeWithNodeGroup.java, > HDFS-5434-branch-2.patch, HDFS_5434.patch > > > If a file has a replica count of one, the HDFS client is exposed to write > failures if the data node fails during a write. With a pipeline of size of > one, no recovery is possible if the sole data node dies. > A simple fix is to force a minimum pipeline size of 2, while leaving the > replication count as 1. The implementation for this is fairly non-invasive. > Although the replica count is one, the block will be written to two data > nodes instead of one. If one of the data nodes fails during the write, normal > pipeline recovery will ensure that the write succeeds to the surviving data > node. > The existing code in the name node will prune the extra replica when it > receives the block received reports for the finalized block from both data > nodes. This results in the intended replica count of one for the block. > This behavior should be controlled by a configuration option such as > {{dfs.namenode.minPipelineSize}}. > This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by > ensuring that the pipeline size passed to > {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: > {code} > max(replication, ${dfs.namenode.minPipelineSize}) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874024#comment-13874024 ] Chris Nauroth commented on HDFS-5758: - Is there a benefit to that change (other than mimicking libacl)? I suppose we could change {{getAclStatus}} like that, but it does mean that all potential clients (not just getfacl) must do both a {{getFileStatus}} and a {{getAclStatus}} to get a complete picture of permissions. I wasn't sure how {{removeAcl}} relates to that comment. {{removeAcl}} is changing because it needs to be able to restore the group permissions into {{FsPermission}} (overwriting the mask) as described above. This is also true for {{removeAclEntries}}, which can reduce an extended ACL to a minimal ACL if the ACL spec removes all extended entries. It was easiest to handle this consistently through {{AclTransformation}}. > NameNode: complete implementation of inode modifications for ACLs. > -- > > Key: HDFS-5758 > URL: https://issues.apache.org/jira/browse/HDFS-5758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch > > > This patch will complete the remaining logic for the ACL get and set APIs, > including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in > the inodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5709: -- Attachment: hdfs-5709-2.patch Thanks for the review ATM, here's a new patch. I took all your comments as advised except for the static FSN method; I feel like it's better to pass around FSN than to make a new Configuration in a static context. > Improve upgrade with existing files and directories named ".snapshot" > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5434) Write resiliency for replica count 1
[ https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873998#comment-13873998 ] Arpit Agarwal commented on HDFS-5434: - Buddy - I think this will be handled by the approach [~sirianni] and I discussed on HDFS-5318 for the 'shared storage' scenario. > Write resiliency for replica count 1 > > > Key: HDFS-5434 > URL: https://issues.apache.org/jira/browse/HDFS-5434 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Buddy >Priority: Minor > Attachments: BlockPlacementPolicyMinPipelineSize.java, > BlockPlacementPolicyMinPipelineSizeWithNodeGroup.java, > HDFS-5434-branch-2.patch, HDFS_5434.patch > > > If a file has a replica count of one, the HDFS client is exposed to write > failures if the data node fails during a write. With a pipeline of size of > one, no recovery is possible if the sole data node dies. > A simple fix is to force a minimum pipeline size of 2, while leaving the > replication count as 1. The implementation for this is fairly non-invasive. > Although the replica count is one, the block will be written to two data > nodes instead of one. If one of the data nodes fails during the write, normal > pipeline recovery will ensure that the write succeeds to the surviving data > node. > The existing code in the name node will prune the extra replica when it > receives the block received reports for the finalized block from both data > nodes. This results in the intended replica count of one for the block. > This behavior should be controlled by a configuration option such as > {{dfs.namenode.minPipelineSize}}. > This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by > ensuring that the pipeline size passed to > {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: > {code} > max(replication, ${dfs.namenode.minPipelineSize}) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5730) Inconsistent Audit logging for HDFS APIs
[ https://issues.apache.org/jira/browse/HDFS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873994#comment-13873994 ] Hadoop QA commented on HDFS-5730: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623446/HDFS-5730.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TestSymlinkHdfsFileSystem org.apache.hadoop.fs.TestSymlinkHdfsFileContext org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.fs.TestResolveHdfsSymlink org.apache.hadoop.fs.TestSymlinkHdfsDisable org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5901//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5901//console This message is automatically generated. > Inconsistent Audit logging for HDFS APIs > > > Key: HDFS-5730 > URL: https://issues.apache.org/jira/browse/HDFS-5730 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-5730.patch > > > When looking at the audit loggs in HDFS, I am seeing some inconsistencies > what was logged with audit and what is added recently. > For more details please check the comments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5434) Write resiliency for replica count 1
[ https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Buddy updated HDFS-5434: Attachment: HDFS-5434-branch-2.patch It looks like the consensus for 2.4 is to implement this with block placement policies. And these policies will not be part of the standard apache release. Unfortunately the default block placement policies are not currently extensible outside of the package. I've attached a patch that makes the constructors protected. > Write resiliency for replica count 1 > > > Key: HDFS-5434 > URL: https://issues.apache.org/jira/browse/HDFS-5434 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Buddy >Priority: Minor > Attachments: BlockPlacementPolicyMinPipelineSize.java, > BlockPlacementPolicyMinPipelineSizeWithNodeGroup.java, > HDFS-5434-branch-2.patch, HDFS_5434.patch > > > If a file has a replica count of one, the HDFS client is exposed to write > failures if the data node fails during a write. With a pipeline of size of > one, no recovery is possible if the sole data node dies. > A simple fix is to force a minimum pipeline size of 2, while leaving the > replication count as 1. The implementation for this is fairly non-invasive. > Although the replica count is one, the block will be written to two data > nodes instead of one. If one of the data nodes fails during the write, normal > pipeline recovery will ensure that the write succeeds to the surviving data > node. > The existing code in the name node will prune the extra replica when it > receives the block received reports for the finalized block from both data > nodes. This results in the intended replica count of one for the block. > This behavior should be controlled by a configuration option such as > {{dfs.namenode.minPipelineSize}}. > This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by > ensuring that the pipeline size passed to > {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: > {code} > max(replication, ${dfs.namenode.minPipelineSize}) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5792) DFSOutputStream.close() throws exceptions with unintuitive stacktraces
Eric Sirianni created HDFS-5792: --- Summary: DFSOutputStream.close() throws exceptions with unintuitive stacktraces Key: HDFS-5792 URL: https://issues.apache.org/jira/browse/HDFS-5792 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Eric Sirianni Priority: Minor Given the following client code: {code} class Foo { void test() { FSDataOutputStream out = ...; out.write(...); out.close(); } } {code} A programmer would expect an exception thrown from {{out.close()}} to include the stack trace of the calling thread: {noformat} ... FSDataOutputStream.close() Foo.test() ... {noformat} Instead, it includes the stack trace from the {{DataStreamer}} thread: {noformat} java.io.IOException: All datanodes 127.0.0.1:49331 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) {noformat} This makes it difficult to debug the _client_ call stack that actually was unwinded when the exception was thrown. A simple fix seems to be modifying {{DFSOutputStream.close()}} to wrap the {{lastException}} from the {{DataStreamer}} thread in a {{Exception}}, thereby getting both stack traces. I can work on a patch for this. Can someone confirm that my approach is acceptable? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
[ https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873974#comment-13873974 ] Brandon Li commented on HDFS-5754: -- The unit test failure was not introduced by this patch. I've filed HDFS-5791 to track the unit test fix. > Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion > > > Key: HDFS-5754 > URL: https://issues.apache.org/jira/browse/HDFS-5754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Brandon Li > Attachments: HDFS-5754.001.patch, HDFS-5754.002.patch, > HDFS-5754.003.patch > > > Currently, LayoutVersion defines the on-disk data format and supported > features of the entire cluster including NN and DNs. LayoutVersion is > persisted in both NN and DNs. When a NN/DN starts up, it checks its > supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a > different LayoutVersion than NN cannot register with the NN. > We propose to split LayoutVersion into two independent values that are local > to the nodes: > - NamenodeLayoutVersion - defines the on-disk data format in NN, including > the format of FSImage, editlog and the directory structure. > - DatanodeLayoutVersion - defines the on-disk data format in DN, including > the format of block data file, metadata file, block pool layout, and the > directory structure. > The LayoutVersion check will be removed in DN registration. If > NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling > upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5791) TestHttpsFileSystem should use a random port to avoid binding error during testing
Brandon Li created HDFS-5791: Summary: TestHttpsFileSystem should use a random port to avoid binding error during testing Key: HDFS-5791 URL: https://issues.apache.org/jira/browse/HDFS-5791 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Brandon Li {noformat} org.apache.hadoop.hdfs.web.TestHttpsFileSystem.org.apache.hadoop.hdfs.web.TestHttpsFileSystem Failing for the past 1 build (Since Failed#5900 ) Took 2.7 sec. Error Message Port in use: localhost:50475 Stacktrace java.net.BindException: Port in use: localhost:50475 at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:383) at java.net.ServerSocket.bind(ServerSocket.java:328) at java.net.ServerSocket.(ServerSocket.java:194) at javax.net.ssl.SSLServerSocket.(SSLServerSocket.java:106) at com.sun.net.ssl.internal.ssl.SSLServerSocketImpl.(SSLServerSocketImpl.java:108) at com.sun.net.ssl.internal.ssl.SSLServerSocketFactoryImpl.createServerSocket(SSLServerSocketFactoryImpl.java:72) at org.mortbay.jetty.security.SslSocketConnector.newServerSocket(SslSocketConnector.java:478) at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73) at org.apache.hadoop.http.HttpServer.openListeners(HttpServer.java:973) at org.apache.hadoop.http.HttpServer.start(HttpServer.java:914) at org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:413) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:770) at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:316) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1847) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1747) at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1217) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:683) at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:351) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:332) at org.apache.hadoop.hdfs.web.TestHttpsFileSystem.setUp(TestHttpsFileSystem.java:64) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873945#comment-13873945 ] Haohui Mai commented on HDFS-5758: -- I perfer not to baking in the concept of minimal / logical ACLs into the APIs. Instead, {{getAclStatus}} / {{removeAcl}} should only handle extended ACLs. It is only the access control that the code needs to take care of logical ACLs. The clients, notably {{getfacl}}, can generate the logical ACL by reading the permissions and the extended acl. That's how libacl under linux works. > NameNode: complete implementation of inode modifications for ACLs. > -- > > Key: HDFS-5758 > URL: https://issues.apache.org/jira/browse/HDFS-5758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch > > > This patch will complete the remaining logic for the ACL get and set APIs, > including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in > the inodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
[ https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873943#comment-13873943 ] Hadoop QA commented on HDFS-5754: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623435/HDFS-5754.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestHttpsFileSystem {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5900//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5900//console This message is automatically generated. > Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion > > > Key: HDFS-5754 > URL: https://issues.apache.org/jira/browse/HDFS-5754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Brandon Li > Attachments: HDFS-5754.001.patch, HDFS-5754.002.patch, > HDFS-5754.003.patch > > > Currently, LayoutVersion defines the on-disk data format and supported > features of the entire cluster including NN and DNs. LayoutVersion is > persisted in both NN and DNs. When a NN/DN starts up, it checks its > supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a > different LayoutVersion than NN cannot register with the NN. > We propose to split LayoutVersion into two independent values that are local > to the nodes: > - NamenodeLayoutVersion - defines the on-disk data format in NN, including > the format of FSImage, editlog and the directory structure. > - DatanodeLayoutVersion - defines the on-disk data format in DN, including > the format of block data file, metadata file, block pool layout, and the > directory structure. > The LayoutVersion check will be removed in DN registration. If > NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling > upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section
[ https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5784: --- Attachment: HDFS-5784.003.patch rebase > reserve space in edit log header and fsimage header for feature flag section > > > Key: HDFS-5784 > URL: https://issues.apache.org/jira/browse/HDFS-5784 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, > HDFS-5784.003.patch > > > We should reserve space in the edit log header and fsimage header so that we > can add layout feature flags later in a compatible manner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery
[ https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873921#comment-13873921 ] Todd Lipcon commented on HDFS-5790: --- Looking at the code, it's not obvious why we need to do: {code} src = leaseManager.findPath(pendingFile); {code} as opposed to something like: {code} src = pendingFile.getFullPathName(); {code} since in theory the inode path and the lease path should always be kept in sync. Same is true of the same call in commitOrCompleteLastBlock() > LeaseManager.findPath is very slow when many leases need recovery > - > > Key: HDFS-5790 > URL: https://issues.apache.org/jira/browse/HDFS-5790 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, performance >Affects Versions: 2.4.0 >Reporter: Todd Lipcon > > We recently saw an issue where the NN restarted while tens of thousands of > files were open. The NN then ended up spending multiple seconds for each > commitBlockSynchronization() call, spending most of its time inside > LeaseManager.findPath(). findPath currently works by looping over all files > held for a given writer, and traversing the filesystem for each one. This > takes way too long when tens of thousands of files are open by a single > writer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery
[ https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873917#comment-13873917 ] Todd Lipcon commented on HDFS-5790: --- The handler threads were spending most of their time in this stack trace: {code} "IPC Server handler 1 on 8055" daemon prio=10 tid=0x2ab5c87bc800 nid=0x71dc runnable [0x56e93000] java.lang.Thread.State: RUNNABLE at java.util.Collections.indexedBinarySearch(Collections.java:215) at java.util.Collections.binarySearch(Collections.java:201) at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.getChildINode(INodeDirectory.java:107) at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.getExistingPathINodes(INodeDirectory.java:211) at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.getNode(INodeDirectory.java:121) at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.getNode(INodeDirectory.java:130) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINode(FSDirectory.java:1247) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileINode(FSDirectory.java:1234) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Lease.findPath(LeaseManager.java:256) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Lease.access$300(LeaseManager.java:225) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.findPath(LeaseManager.java:186) - locked <0x2aaac5a38b48> (a org.apache.hadoop.hdfs.server.namenode.LeaseManager) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:3229) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:560) {code} (line numbers may be slightly off, since this is from an older release, but code appears to still be structured approximately the same in trunk) > LeaseManager.findPath is very slow when many leases need recovery > - > > Key: HDFS-5790 > URL: https://issues.apache.org/jira/browse/HDFS-5790 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, performance >Affects Versions: 2.4.0 >Reporter: Todd Lipcon > > We recently saw an issue where the NN restarted while tens of thousands of > files were open. The NN then ended up spending multiple seconds for each > commitBlockSynchronization() call, spending most of its time inside > LeaseManager.findPath(). findPath currently works by looping over all files > held for a given writer, and traversing the filesystem for each one. This > takes way too long when tens of thousands of files are open by a single > writer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery
Todd Lipcon created HDFS-5790: - Summary: LeaseManager.findPath is very slow when many leases need recovery Key: HDFS-5790 URL: https://issues.apache.org/jira/browse/HDFS-5790 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 2.4.0 Reporter: Todd Lipcon We recently saw an issue where the NN restarted while tens of thousands of files were open. The NN then ended up spending multiple seconds for each commitBlockSynchronization() call, spending most of its time inside LeaseManager.findPath(). findPath currently works by looping over all files held for a given writer, and traversing the filesystem for each one. This takes way too long when tens of thousands of files are open by a single writer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873865#comment-13873865 ] Chris Nauroth commented on HDFS-5758: - bq. It seems to me that you can enforce the permission by the following: Yes, this is basically what the HDFS-5612 enforcement patch does, except we can't run the existing logic for checking {{FsPermission}} at all if an inode has an ACL, so it would be more like: {code} if (hasAcl) { decide whether acl allows the access } else { checkFsPermission(); } {code} Always checking {{FsPermission}} before the ACL could erroneously deny access to someone who had been granted access through a named user or named group ACL entry. bq. I'm not sure why AclStorage#updateINodeAcl and AclStorage#readINodeAcl need to construct entries from the old permissions. There are a few different reasons for this: # Every inode in the system really has a logical ACL. Even if the ACL bit is off and there is no {{AclFeature}}, the inode still has a minimal ACL. Calling {{getAclStatus}} on an inode with no ACL bit/no {{AclFeature}} still needs to return the minimal ACL, so we construct 3 ACL entries based on the permission bits. # For an inode that has an ACL, the expected behavior of chmod is that it changes the mask entry instead of the group permission bits. Likewise, file listings should show the mask permissions in place of the group permissions. By choosing to store the mask permission inside the group permission bits, we minimize the impact on existing code like {{setPermission}} and {{listStatus}}. Those APIs continue to read/write bits in {{FsPermission}} without needing awareness that they are logically reading/writing the mask. Only the ACL APIs need this awareness. # Updates to the owner and other ACL entries are also supposed to update the corresponding owner and other permission bits. For symmetry, if chmod changes those permissions, then the change is also supposed to be visible in the corresponding ACL entries. Similar to the above, by choosing to reuse the corresponding {{FsPermission}} bits for this purpose, then a lot of existing code just works with no changes. We also prevent the possibility of bugs in trying to keep 2 data sources in sync (one copy in {{FsPermission}} and one copy in an {{AclEntry}} instance). # Finally, I expect a minor storage optimization as a result of this strategy. We're keeping 3 of the "logical" ACL entries in {{FsPermission}}, so that's 3 fewer {{AclEntry}} instances to store inside each {{AclFeature}}. Additionally, this will increase the likelihood of de-duplication ("Global ACL Set" as described in the design doc) once we implement that. If 2 ACLs differ only in their owner, mask or other entries, then their underlying {{AclFeature}} instances will still have the same contents, and we can de-duplicate them. > NameNode: complete implementation of inode modifications for ACLs. > -- > > Key: HDFS-5758 > URL: https://issues.apache.org/jira/browse/HDFS-5758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch > > > This patch will complete the remaining logic for the ACL get and set APIs, > including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in > the inodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873859#comment-13873859 ] Hadoop QA commented on HDFS-5318: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623422/HDFS-5318b-branch-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5899//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5899//console This message is automatically generated. > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, > HDFS-5318b-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5788) listLocatedStatus response can be very large
[ https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873839#comment-13873839 ] Jason Lowe commented on HDFS-5788: -- They are usually short-lived but a bit longer-lived when we can't push them out the network in a timely manner. Then due to lack of flow control in the RPC layer we can fill up the heap with these given a large enough average response buffer per call and enough clients. See HADOOP-8942. This change mitigates the issue for listLocatedStatus since a much smaller response payload means it takes a lot more simultaneous clients to consume an equal amount of heap space. > listLocatedStatus response can be very large > > > Key: HDFS-5788 > URL: https://issues.apache.org/jira/browse/HDFS-5788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > > Currently we limit the size of listStatus requests to a default of 1000 > entries. This works fine except in the case of listLocatedStatus where the > location information can be quite large. As an example, a directory with 7000 > entries, 4 blocks each, 3 way replication - a listLocatedStatus response is > over 1MB. This can chew up very large amounts of memory in the NN if lots of > clients try to do this simultaneously. > Seems like it would be better if we also considered the amount of location > information being returned when deciding how many files to return. > Patch will follow shortly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section
[ https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873832#comment-13873832 ] Hadoop QA commented on HDFS-5784: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623450/HDFS-5784.002.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5902//console This message is automatically generated. > reserve space in edit log header and fsimage header for feature flag section > > > Key: HDFS-5784 > URL: https://issues.apache.org/jira/browse/HDFS-5784 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch > > > We should reserve space in the edit log header and fsimage header so that we > can add layout feature flags later in a compatible manner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section
[ https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5784: --- Attachment: HDFS-5784.002.patch > reserve space in edit log header and fsimage header for feature flag section > > > Key: HDFS-5784 > URL: https://issues.apache.org/jira/browse/HDFS-5784 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch > > > We should reserve space in the edit log header and fsimage header so that we > can add layout feature flags later in a compatible manner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873817#comment-13873817 ] Daryn Sharp commented on HDFS-4564: --- This bug will trigger the NPE described HADOOP-9363. The authenticator handling is flawed because the JDK will transparently SPNEGO auth, but webhdfs will unnecessary interpret a login failure as fallback to pseudo auth. This triggers another JDK bug resulting in a replay attack. For some reason, if there are 16 persistent connections open (webhdfs tries to disconnect, but the HttpURLConnection javadocs indicates it may not honor the close and it certainly does not per tcpdump), the com.sun code will send the OPTIONS request to a cached connection, and also immediately open a new connection to send the same OPTIONS request. The new connection reuses the cached connection's service ticket and boom - replay attack. > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp reassigned HDFS-4564: - Assignee: Daryn Sharp > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873815#comment-13873815 ] Haohui Mai commented on HDFS-5758: -- It seems to me that you can enforce the permission by the following: {code} isAccessAllowed(): checkFsPermission(); if (hasAcl) { decide whether acl allows the access } {code} I'm not sure why AclStorage#updateINodeAcl and AclStorage#readINodeAcl need to construct entries from the old permissions. > NameNode: complete implementation of inode modifications for ACLs. > -- > > Key: HDFS-5758 > URL: https://issues.apache.org/jira/browse/HDFS-5758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch > > > This patch will complete the remaining logic for the ACL get and set APIs, > including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in > the inodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5789) Some of snapshot, Cache APIs missing checkOperation double check in fsn
Uma Maheswara Rao G created HDFS-5789: - Summary: Some of snapshot, Cache APIs missing checkOperation double check in fsn Key: HDFS-5789 URL: https://issues.apache.org/jira/browse/HDFS-5789 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G HDFS-4591 introduced double checked for HA state while taking fsn lock. checkoperation made before actually taking lock and after the lock again. This pattern missed in some of the snapshot APIs and cache management related APIs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5730) Inconsistent Audit logging for HDFS APIs
[ https://issues.apache.org/jira/browse/HDFS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-5730: -- Attachment: HDFS-5730.patch I have attached initial version of patch here. > Inconsistent Audit logging for HDFS APIs > > > Key: HDFS-5730 > URL: https://issues.apache.org/jira/browse/HDFS-5730 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-5730.patch > > > When looking at the audit loggs in HDFS, I am seeing some inconsistencies > what was logged with audit and what is added recently. > For more details please check the comments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5730) Inconsistent Audit logging for HDFS APIs
[ https://issues.apache.org/jira/browse/HDFS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-5730: -- Status: Patch Available (was: Open) > Inconsistent Audit logging for HDFS APIs > > > Key: HDFS-5730 > URL: https://issues.apache.org/jira/browse/HDFS-5730 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-5730.patch > > > When looking at the audit loggs in HDFS, I am seeing some inconsistencies > what was logged with audit and what is added recently. > For more details please check the comments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5153) Datanode should send block reports for each storage in a separate message
[ https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873748#comment-13873748 ] Hadoop QA commented on HDFS-5153: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623413/HDFS-5153.03b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5898//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5898//console This message is automatically generated. > Datanode should send block reports for each storage in a separate message > - > > Key: HDFS-5153 > URL: https://issues.apache.org/jira/browse/HDFS-5153 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal > Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, > HDFS-5153.03b.patch > > > When the number of blocks on the DataNode grows large we start running into a > few issues: > # Block reports take a long time to process on the NameNode. In testing we > have seen that a block report with 6 Million blocks takes close to one second > to process on the NameNode. The NameSystem write lock is held during this > time. > # We start hitting the default protobuf message limit of 64MB somewhere > around 10 Million blocks. While we can increase the message size limit it > already takes over 7 seconds to serialize/unserialize a block report of this > size. > HDFS-2832 has introduced the concept of a DataNode as a collection of > storages i.e. the NameNode is aware of all the volumes (storage directories) > attached to a given DataNode. This makes it easy to split block reports from > the DN by sending one report per storage directory to mitigate the above > problems. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
[ https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5754: - Attachment: HDFS-5754.003.patch > Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion > > > Key: HDFS-5754 > URL: https://issues.apache.org/jira/browse/HDFS-5754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Brandon Li > Attachments: HDFS-5754.001.patch, HDFS-5754.002.patch, > HDFS-5754.003.patch > > > Currently, LayoutVersion defines the on-disk data format and supported > features of the entire cluster including NN and DNs. LayoutVersion is > persisted in both NN and DNs. When a NN/DN starts up, it checks its > supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a > different LayoutVersion than NN cannot register with the NN. > We propose to split LayoutVersion into two independent values that are local > to the nodes: > - NamenodeLayoutVersion - defines the on-disk data format in NN, including > the format of FSImage, editlog and the directory structure. > - DatanodeLayoutVersion - defines the on-disk data format in DN, including > the format of block data file, metadata file, block pool layout, and the > directory structure. > The LayoutVersion check will be removed in DN registration. If > NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling > upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-4600) HDFS file append failing in multinode cluster
[ https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik resolved HDFS-4600. -- Resolution: Invalid The issues seems to be caused by specific cluster configuration rather than a software problem. Closing. > HDFS file append failing in multinode cluster > - > > Key: HDFS-4600 > URL: https://issues.apache.org/jira/browse/HDFS-4600 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Roman Shaposhnik > Attachments: X.java, core-site.xml, hdfs-site.xml > > > NOTE: the following only happens in a fully distributed setup (core-site.xml > and hdfs-site.xml are attached) > Steps to reproduce: > {noformat} > $ javac -cp /usr/lib/hadoop/client/\* X.java > $ echo a > a.txt > $ hadoop fs -ls /tmp/a.txt > ls: `/tmp/a.txt': No such file or directory > $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt > 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > Exception in thread "main" java.io.IOException: Failed to replace a bad > datanode on the existing pipeline due to no more good datanodes being > available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > {noformat} > Given that the file actually does get created: > {noformat} > $ hadoop fs -ls /tmp/a.txt > Found 1 items > -rw-r--r-- 3 root hadoop 6 2013-03-13 16:05 /tmp/a.txt > {noformat} > this feels like a regression in APPEND's functionality. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster
[ https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873728#comment-13873728 ] Konstantin Boudnik commented on HDFS-4600: -- Actually you're right. I've re-read the history of the ticket and will close it right away. Please disregard my last comment ;) > HDFS file append failing in multinode cluster > - > > Key: HDFS-4600 > URL: https://issues.apache.org/jira/browse/HDFS-4600 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Roman Shaposhnik > Attachments: X.java, core-site.xml, hdfs-site.xml > > > NOTE: the following only happens in a fully distributed setup (core-site.xml > and hdfs-site.xml are attached) > Steps to reproduce: > {noformat} > $ javac -cp /usr/lib/hadoop/client/\* X.java > $ echo a > a.txt > $ hadoop fs -ls /tmp/a.txt > ls: `/tmp/a.txt': No such file or directory > $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt > 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > Exception in thread "main" java.io.IOException: Failed to replace a bad > datanode on the existing pipeline due to no more good datanodes being > available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > {noformat} > Given that the file actually does get created: > {noformat} > $ hadoop fs -ls /tmp/a.txt > Found 1 items > -rw-r--r-- 3 root hadoop 6 2013-03-13 16:05 /tmp/a.txt > {noformat} > this feels like a regression in APPEND's functionality. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5788) listLocatedStatus response can be very large
[ https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873720#comment-13873720 ] Suresh Srinivas commented on HDFS-5788: --- bq. a listLocatedStatus response is over 1MB These are short lived objects and are garbage collected in young generation. This causes lot of issues? bq. Seems like it would be better if we also considered the amount of location information being returned when deciding how many files to return. Can you please add details about the solution? > listLocatedStatus response can be very large > > > Key: HDFS-5788 > URL: https://issues.apache.org/jira/browse/HDFS-5788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > > Currently we limit the size of listStatus requests to a default of 1000 > entries. This works fine except in the case of listLocatedStatus where the > location information can be quite large. As an example, a directory with 7000 > entries, 4 blocks each, 3 way replication - a listLocatedStatus response is > over 1MB. This can chew up very large amounts of memory in the NN if lots of > clients try to do this simultaneously. > Seems like it would be better if we also considered the amount of location > information being returned when deciding how many files to return. > Patch will follow shortly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.
[ https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873722#comment-13873722 ] Chris Nauroth commented on HDFS-5608: - Thanks, Sachin. I'll check out the new patch. {quote} Current implementation : mask::rw,other::rwx expected - mask:rw,other:rwx {quote} There are multiple ways in which our ACL spec syntax is a subset of the syntax accepted by Linux setfacl. I just checked Linux, and it accepts both of the above. Right now, our implementation only accepts one of them: mask::rw-. I wasn't aware of this particular difference, so thank you for pointing it out. Let's stick with our minimal supported syntax for now, because it's consistent with the display of getfacl. This choice doesn't limit the functionality we provide. It just means that some of the syntax shortcuts available on Linux aren't available here. For the record, here are all of the syntax aspects supported on Linux that we don't yet support (that I'm aware of). We can revisit later if we want to fully support all of this, but it's not critical for an initial implementation. # The scope default can be shortened to d. # The types user/group/mask/other can be shortened to u/g/m/o respectively. # Permissions can be specified partially, with anything omitted assumed to be off. For example, mask::r-x can be shortened to mask::rx. # Permissions can be an octal digit 0-7. # Whitespace between delimiter characters and non-delimiter characters is ignored. {quote} Where can we define the common method which be accessed from both projects(hdfs,common). {quote} Right now, I'm thinking that we can define a static method on the {{AclEntry}} class. HADOOP-10213 is still open for some changes in the CLI parsing, so I'm going to discuss it over there with [~vinayrpet] and you. > WebHDFS: implement GETACLSTATUS and SETACL. > --- > > Key: HDFS-5608 > URL: https://issues.apache.org/jira/browse/HDFS-5608 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Sachin Jose > Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch, HDFS-5608.2.patch > > > Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers
[ https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873719#comment-13873719 ] Kousuke Saruta commented on HDFS-5761: -- Thanks for your comment, Uma. At first, I thought same as you. I thought it's good to branch the logic depending on whether checksum type is NULL or not. But, on second thought, BlockPoolSlice should not have logic which depends specific checksum algorithm. How to verify is responsibility of each checksum algorithm. > DataNode fails to validate integrity for checksum type NULL when DataNode > recovers > --- > > Key: HDFS-5761 > URL: https://issues.apache.org/jira/browse/HDFS-5761 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta > Attachments: HDFS-5761.patch > > > When DataNode is down during writing blocks, the blocks are not filinalized > and the next time DataNode recovers, integrity validation will run. > But if we use NULL for checksum algorithm (we can set NULL to > dfs.checksum.type), DataNode will fail to validate integrity and cannot be > up. > The cause is in BlockPoolSlice#validateIntegrity. > In the method, there is following code. > {code} > long numChunks = Math.min( > (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, > (metaFileLen - crcHeaderLen)/checksumSize); > {code} > When we choose NULL checksum, checksumSize is 0 so ArithmeticException will > be thrown and DataNode cannot be up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster
[ https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873712#comment-13873712 ] Suresh Srinivas commented on HDFS-4600: --- I am actually surprised. Many people have expressed that this is not a bug. You also have expressed the same opinion in the comments above. What changed? > HDFS file append failing in multinode cluster > - > > Key: HDFS-4600 > URL: https://issues.apache.org/jira/browse/HDFS-4600 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Roman Shaposhnik > Attachments: X.java, core-site.xml, hdfs-site.xml > > > NOTE: the following only happens in a fully distributed setup (core-site.xml > and hdfs-site.xml are attached) > Steps to reproduce: > {noformat} > $ javac -cp /usr/lib/hadoop/client/\* X.java > $ echo a > a.txt > $ hadoop fs -ls /tmp/a.txt > ls: `/tmp/a.txt': No such file or directory > $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt > 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > Exception in thread "main" java.io.IOException: Failed to replace a bad > datanode on the existing pipeline due to no more good datanodes being > available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > {noformat} > Given that the file actually does get created: > {noformat} > $ hadoop fs -ls /tmp/a.txt > Found 1 items > -rw-r--r-- 3 root hadoop 6 2013-03-13 16:05 /tmp/a.txt > {noformat} > this feels like a regression in APPEND's functionality. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster
[ https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873695#comment-13873695 ] Konstantin Boudnik commented on HDFS-4600: -- Suresh, I didn't see its fixed, so yes - this is still seems to be an issue. > HDFS file append failing in multinode cluster > - > > Key: HDFS-4600 > URL: https://issues.apache.org/jira/browse/HDFS-4600 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Roman Shaposhnik > Attachments: X.java, core-site.xml, hdfs-site.xml > > > NOTE: the following only happens in a fully distributed setup (core-site.xml > and hdfs-site.xml are attached) > Steps to reproduce: > {noformat} > $ javac -cp /usr/lib/hadoop/client/\* X.java > $ echo a > a.txt > $ hadoop fs -ls /tmp/a.txt > ls: `/tmp/a.txt': No such file or directory > $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt > 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > Exception in thread "main" java.io.IOException: Failed to replace a bad > datanode on the existing pipeline due to no more good datanodes being > available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > {noformat} > Given that the file actually does get created: > {noformat} > $ hadoop fs -ls /tmp/a.txt > Found 1 items > -rw-r--r-- 3 root hadoop 6 2013-03-13 16:05 /tmp/a.txt > {noformat} > this feels like a regression in APPEND's functionality. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster
[ https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873691#comment-13873691 ] Suresh Srinivas commented on HDFS-4600: --- [~cos], looks like you changed the priority. Is this still an issue? If not, I plan on closing this as not a problem in a day or so. > HDFS file append failing in multinode cluster > - > > Key: HDFS-4600 > URL: https://issues.apache.org/jira/browse/HDFS-4600 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Roman Shaposhnik > Attachments: X.java, core-site.xml, hdfs-site.xml > > > NOTE: the following only happens in a fully distributed setup (core-site.xml > and hdfs-site.xml are attached) > Steps to reproduce: > {noformat} > $ javac -cp /usr/lib/hadoop/client/\* X.java > $ echo a > a.txt > $ hadoop fs -ls /tmp/a.txt > ls: `/tmp/a.txt': No such file or directory > $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt > 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > Exception in thread "main" java.io.IOException: Failed to replace a bad > datanode on the existing pipeline due to no more good datanodes being > available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > {noformat} > Given that the file actually does get created: > {noformat} > $ hadoop fs -ls /tmp/a.txt > Found 1 items > -rw-r--r-- 3 root hadoop 6 2013-03-13 16:05 /tmp/a.txt > {noformat} > this feels like a regression in APPEND's functionality. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4600) HDFS file append failing in multinode cluster
[ https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-4600: - Priority: Major (was: Minor) > HDFS file append failing in multinode cluster > - > > Key: HDFS-4600 > URL: https://issues.apache.org/jira/browse/HDFS-4600 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Roman Shaposhnik > Attachments: X.java, core-site.xml, hdfs-site.xml > > > NOTE: the following only happens in a fully distributed setup (core-site.xml > and hdfs-site.xml are attached) > Steps to reproduce: > {noformat} > $ javac -cp /usr/lib/hadoop/client/\* X.java > $ echo a > a.txt > $ hadoop fs -ls /tmp/a.txt > ls: `/tmp/a.txt': No such file or directory > $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt > 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > Exception in thread "main" java.io.IOException: Failed to replace a bad > datanode on the existing pipeline due to no more good datanodes being > available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > {noformat} > Given that the file actually does get created: > {noformat} > $ hadoop fs -ls /tmp/a.txt > Found 1 items > -rw-r--r-- 3 root hadoop 6 2013-03-13 16:05 /tmp/a.txt > {noformat} > this feels like a regression in APPEND's functionality. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Sirianni updated HDFS-5318: Attachment: HDFS-5318b-branch-2.patch Update patch. Add overload to {{BlocksMap.getStorages()}} that filters returned Iterable by storage state. Use that overload in {{BlockManager}} instead of if...continue in loops. > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, > HDFS-5318b-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873566#comment-13873566 ] Arpit Agarwal commented on HDFS-5318: - Hi Eric, thanks for the new patch. I will try to review it by the weekend. > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5153) Datanode should send block reports for each storage in a separate message
[ https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5153: Attachment: HDFS-5153.03b.patch > Datanode should send block reports for each storage in a separate message > - > > Key: HDFS-5153 > URL: https://issues.apache.org/jira/browse/HDFS-5153 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal > Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, > HDFS-5153.03b.patch > > > When the number of blocks on the DataNode grows large we start running into a > few issues: > # Block reports take a long time to process on the NameNode. In testing we > have seen that a block report with 6 Million blocks takes close to one second > to process on the NameNode. The NameSystem write lock is held during this > time. > # We start hitting the default protobuf message limit of 64MB somewhere > around 10 Million blocks. While we can increase the message size limit it > already takes over 7 seconds to serialize/unserialize a block report of this > size. > HDFS-2832 has introduced the concept of a DataNode as a collection of > storages i.e. the NameNode is aware of all the volumes (storage directories) > attached to a given DataNode. This makes it easy to split block reports from > the DN by sending one report per storage directory to mitigate the above > problems. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873518#comment-13873518 ] Eric Sirianni commented on HDFS-5318: - Hi Arpit - would appreciate any feedback you have on the patch when you get a chance. Also, as far as I can tell, the test failures is not related to the patch ({{testBlocksAddedWhileStandbyIsDown()}} passes in my environment and {{TestBalancerWithNodeGroup}} seems flaky -- see HDFS-4376). > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5788) listLocatedStatus response can be very large
Nathan Roberts created HDFS-5788: Summary: listLocatedStatus response can be very large Key: HDFS-5788 URL: https://issues.apache.org/jira/browse/HDFS-5788 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0, 0.23.10, 3.0.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Currently we limit the size of listStatus requests to a default of 1000 entries. This works fine except in the case of listLocatedStatus where the location information can be quite large. As an example, a directory with 7000 entries, 4 blocks each, 3 way replication - a listLocatedStatus response is over 1MB. This can chew up very large amounts of memory in the NN if lots of clients try to do this simultaneously. Seems like it would be better if we also considered the amount of location information being returned when deciding how many files to return. Patch will follow shortly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5777) Update LayoutVersion for the new editlog op OP_ADD_BLOCK
[ https://issues.apache.org/jira/browse/HDFS-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873390#comment-13873390 ] Hudson commented on HDFS-5777: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1646 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1646/]) HDFS-5777. Update LayoutVersion for the new editlog op OP_ADD_BLOCK. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558675) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml > Update LayoutVersion for the new editlog op OP_ADD_BLOCK > > > Key: HDFS-5777 > URL: https://issues.apache.org/jira/browse/HDFS-5777 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 3.0.0 > > Attachments: HDFS-5704-5777-branch2.patch, HDFS-5777.000.patch, > HDFS-5777.001.patch, HDFS-5777.002.patch, editsStored, editsStored, > editsStored > > > HDFS-5704 adds a new editlog op OP_ADD_BLOCK. We need to update the > LayoutVersion for this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
[ https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873391#comment-13873391 ] Hudson commented on HDFS-5762: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1646 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1646/]) HDFS-5762. BlockReaderLocal does not return -1 on EOF when doing zero-length reads (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558526) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java > BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads > -- > > Key: HDFS-5762 > URL: https://issues.apache.org/jira/browse/HDFS-5762 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.4.0 > > Attachments: HDFS-5762.001.patch, HDFS-5762.002.patch > > > Unlike the other block readers, BlockReaderLocal currently doesn't return -1 > on EOF when doing zero-length reads. This behavior, in turn, propagates to > the DFSInputStream. BlockReaderLocal should do this, so that client can > determine whether the file is at an end by doing a zero-length read and > checking for -1. > One place this shows up is in libhdfs, which does such a 0-length read to > determine if direct (i.e., ByteBuffer) reads are supported. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5766) In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange
[ https://issues.apache.org/jira/browse/HDFS-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873394#comment-13873394 ] Hudson commented on HDFS-5766: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1646 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1646/]) HDFS-5766. In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange (Liang Xie via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558536) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java > In DFSInputStream, do not add datanode to deadNodes after > InvalidEncryptionKeyException in fetchBlockByteRange > -- > > Key: HDFS-5766 > URL: https://issues.apache.org/jira/browse/HDFS-5766 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Fix For: 2.4.0 > > Attachments: HDFS-5766.txt > > > Found this issue when i read fetchBlockByteRange code: > If we hit InvalidEncryptionKeyException, current logic is: > 1) reduce the retry count > 2) clearDataEncryptionKey > 3) addToDeadNodes > 4) retry in another loop... > If i am correct, we should treat InvalidEncryptionKeyException similar with > InvalidBlockToken branch, bypassing addToDeadNodes(), since it's a client > related, not caused by DN side:) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5768) Consolidate the serialization code in DelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873388#comment-13873388 ] Hudson commented on HDFS-5768: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1646 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1646/]) HDFS-5768. Consolidate the serialization code in DelegationTokenSecretManager. Contributed by Haohui Mai (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558598) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > Consolidate the serialization code in DelegationTokenSecretManager > -- > > Key: HDFS-5768 > URL: https://issues.apache.org/jira/browse/HDFS-5768 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 3.0.0 > > Attachments: HDFS-5768.000.patch, HDFS-5768.001.patch > > > This jira proposes to extract a private class for the serialization code for > DelegationTokenSecretManager, so that it becomes easier to introduce new code > paths to serialize the same set of information using protobuf. > This jira does not intend to introduce any functionality changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5775) Consolidate the code for serialization in CacheManager
[ https://issues.apache.org/jira/browse/HDFS-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873387#comment-13873387 ] Hudson commented on HDFS-5775: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1646 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1646/]) HDFS-5775. Consolidate the code for serialization in CacheManager. Contributed by Haohui Mai (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558599) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java > Consolidate the code for serialization in CacheManager > -- > > Key: HDFS-5775 > URL: https://issues.apache.org/jira/browse/HDFS-5775 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 3.0.0 > > Attachments: HDFS-5775.000.patch > > > This jira proposes to consolidate the code that is responsible for > serializing / deserializing cache manager state into a separate class, so > that it is easier to introduce new code path to serialize the data using > protobuf. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5768) Consolidate the serialization code in DelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873366#comment-13873366 ] Hudson commented on HDFS-5768: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1671 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1671/]) HDFS-5768. Consolidate the serialization code in DelegationTokenSecretManager. Contributed by Haohui Mai (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558598) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > Consolidate the serialization code in DelegationTokenSecretManager > -- > > Key: HDFS-5768 > URL: https://issues.apache.org/jira/browse/HDFS-5768 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 3.0.0 > > Attachments: HDFS-5768.000.patch, HDFS-5768.001.patch > > > This jira proposes to extract a private class for the serialization code for > DelegationTokenSecretManager, so that it becomes easier to introduce new code > paths to serialize the same set of information using protobuf. > This jira does not intend to introduce any functionality changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5766) In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange
[ https://issues.apache.org/jira/browse/HDFS-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873372#comment-13873372 ] Hudson commented on HDFS-5766: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1671 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1671/]) HDFS-5766. In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange (Liang Xie via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558536) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java > In DFSInputStream, do not add datanode to deadNodes after > InvalidEncryptionKeyException in fetchBlockByteRange > -- > > Key: HDFS-5766 > URL: https://issues.apache.org/jira/browse/HDFS-5766 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Fix For: 2.4.0 > > Attachments: HDFS-5766.txt > > > Found this issue when i read fetchBlockByteRange code: > If we hit InvalidEncryptionKeyException, current logic is: > 1) reduce the retry count > 2) clearDataEncryptionKey > 3) addToDeadNodes > 4) retry in another loop... > If i am correct, we should treat InvalidEncryptionKeyException similar with > InvalidBlockToken branch, bypassing addToDeadNodes(), since it's a client > related, not caused by DN side:) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5777) Update LayoutVersion for the new editlog op OP_ADD_BLOCK
[ https://issues.apache.org/jira/browse/HDFS-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873368#comment-13873368 ] Hudson commented on HDFS-5777: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1671 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1671/]) HDFS-5777. Update LayoutVersion for the new editlog op OP_ADD_BLOCK. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558675) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml > Update LayoutVersion for the new editlog op OP_ADD_BLOCK > > > Key: HDFS-5777 > URL: https://issues.apache.org/jira/browse/HDFS-5777 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 3.0.0 > > Attachments: HDFS-5704-5777-branch2.patch, HDFS-5777.000.patch, > HDFS-5777.001.patch, HDFS-5777.002.patch, editsStored, editsStored, > editsStored > > > HDFS-5704 adds a new editlog op OP_ADD_BLOCK. We need to update the > LayoutVersion for this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
[ https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873369#comment-13873369 ] Hudson commented on HDFS-5762: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1671 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1671/]) HDFS-5762. BlockReaderLocal does not return -1 on EOF when doing zero-length reads (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558526) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java > BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads > -- > > Key: HDFS-5762 > URL: https://issues.apache.org/jira/browse/HDFS-5762 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.4.0 > > Attachments: HDFS-5762.001.patch, HDFS-5762.002.patch > > > Unlike the other block readers, BlockReaderLocal currently doesn't return -1 > on EOF when doing zero-length reads. This behavior, in turn, propagates to > the DFSInputStream. BlockReaderLocal should do this, so that client can > determine whether the file is at an end by doing a zero-length read and > checking for -1. > One place this shows up is in libhdfs, which does such a 0-length read to > determine if direct (i.e., ByteBuffer) reads are supported. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5775) Consolidate the code for serialization in CacheManager
[ https://issues.apache.org/jira/browse/HDFS-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873365#comment-13873365 ] Hudson commented on HDFS-5775: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1671 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1671/]) HDFS-5775. Consolidate the code for serialization in CacheManager. Contributed by Haohui Mai (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558599) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java > Consolidate the code for serialization in CacheManager > -- > > Key: HDFS-5775 > URL: https://issues.apache.org/jira/browse/HDFS-5775 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 3.0.0 > > Attachments: HDFS-5775.000.patch > > > This jira proposes to consolidate the code that is responsible for > serializing / deserializing cache manager state into a separate class, so > that it is easier to introduce new code path to serialize the data using > protobuf. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5189) Rename the "CorruptBlocks" metric to "CorruptReplicas"
[ https://issues.apache.org/jira/browse/HDFS-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reassigned HDFS-5189: - Assignee: (was: Harsh J) > Rename the "CorruptBlocks" metric to "CorruptReplicas" > -- > > Key: HDFS-5189 > URL: https://issues.apache.org/jira/browse/HDFS-5189 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.1.0-beta >Reporter: Harsh J >Priority: Minor > > The NameNode increments a "CorruptBlocks" metric even if only one of the > block's > replicas is reported corrupt (genuine checksum fail, or even if a > replica has a bad genstamp). In cases where this is incremented, fsck > still reports a healthy state. > This is confusing to users and causes false alarm as they feel this is to be > monitored (instead of MissingBlocks). The metric is truly trying to report > only corrupt replicas, not whole blocks, and ought to be renamed. > FWIW, the "dfsadmin -report" reports a proper string of "Blocks with corrupt > replicas:" when printing this count. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5766) In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange
[ https://issues.apache.org/jira/browse/HDFS-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873282#comment-13873282 ] Hudson commented on HDFS-5766: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #454 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/454/]) HDFS-5766. In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange (Liang Xie via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558536) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java > In DFSInputStream, do not add datanode to deadNodes after > InvalidEncryptionKeyException in fetchBlockByteRange > -- > > Key: HDFS-5766 > URL: https://issues.apache.org/jira/browse/HDFS-5766 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Fix For: 2.4.0 > > Attachments: HDFS-5766.txt > > > Found this issue when i read fetchBlockByteRange code: > If we hit InvalidEncryptionKeyException, current logic is: > 1) reduce the retry count > 2) clearDataEncryptionKey > 3) addToDeadNodes > 4) retry in another loop... > If i am correct, we should treat InvalidEncryptionKeyException similar with > InvalidBlockToken branch, bypassing addToDeadNodes(), since it's a client > related, not caused by DN side:) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5777) Update LayoutVersion for the new editlog op OP_ADD_BLOCK
[ https://issues.apache.org/jira/browse/HDFS-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873278#comment-13873278 ] Hudson commented on HDFS-5777: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #454 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/454/]) HDFS-5777. Update LayoutVersion for the new editlog op OP_ADD_BLOCK. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558675) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml > Update LayoutVersion for the new editlog op OP_ADD_BLOCK > > > Key: HDFS-5777 > URL: https://issues.apache.org/jira/browse/HDFS-5777 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 3.0.0 > > Attachments: HDFS-5704-5777-branch2.patch, HDFS-5777.000.patch, > HDFS-5777.001.patch, HDFS-5777.002.patch, editsStored, editsStored, > editsStored > > > HDFS-5704 adds a new editlog op OP_ADD_BLOCK. We need to update the > LayoutVersion for this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)