[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247215#comment-15247215 ] Hadoop QA commented on HDFS-10284: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 54s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 9s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 135m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.server.namenode.TestEditLog | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.TestDecommissionWithStriped | | | hadoop.hdfs.shortcircuit.TestShortCircuitCache | \\ \\ || Subsystem ||
[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247129#comment-15247129 ] Walter Su commented on HDFS-10301: -- Oh, I see. In this case, the reports are not splitted. And because the for-loop is outside the lock, the 2 for-loops interleaved. {code} for (int r = 0; r < reports.length; r++) { {code} > Blocks removed by thousands due to falsely detected zombie storages > --- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10284: - Attachment: HDFS-10284.003.patch The v3 patch is to address [~brahmareddy]'s comment about code comment. As the test cases {{testCheckSafeModeX()}} need more than 3 words each to explain and are grouped together with leading detailed javadoc why we split them, I did not rename the test methods further. > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch, > HDFS-10284.002.patch, HDFS-10284.003.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9684) DataNode stopped sending heartbeat after getting OutOfMemoryError form DataTransfer thread.
[ https://issues.apache.org/jira/browse/HDFS-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247037#comment-15247037 ] Walter Su commented on HDFS-9684: - My previous comment is incorrect. It turns out that the MR tasks swallowed all the virtual memories. > DataNode stopped sending heartbeat after getting OutOfMemoryError form > DataTransfer thread. > --- > > Key: HDFS-9684 > URL: https://issues.apache.org/jira/browse/HDFS-9684 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Blocker > Attachments: HDFS-9684.01.patch > > > {noformat} > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:714) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1999) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:2008) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:857) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:671) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246999#comment-15246999 ] Mingliang Liu commented on HDFS-10306: -- Thank you [~brahmareddy] for your review (and previous work). Thanks [~jingzhao] for your review and commit. > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-10306.000.patch > > > This is a follow-up of [HDFS-10192]. > The {{BlockManagerSafeMode$SafeModeMonitor#canLeave()}} is not checking the > {{namesystem#inTransitionToActive()}}, while it should. According to the fix > of [HDFS-10192], we should add this check to prevent the {{smmthread}} from > calling {{leaveSafeMode()}} too early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246996#comment-15246996 ] Walter Su commented on HDFS-10301: -- 1. IPC reader is single-thread by default. If it's multi-threaded, The order of putting rpc requests into {{callQueue}} is unspecified. 1. IPC {{callQueue}} is fifo. 2. IPC Handler is multi-threaded. If 2 handlers are both waiting the fsn lock, the entry order depends on the fairness of the lock. bq. When constructed as fair, threads contend for entry using an *approximately* arrival-order policy. When the currently held lock is released either the longest-waiting single writer thread will be assigned the write lock... (quore from https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html) I think if DN can't get acked from NN, it shouldn't assume the arrival/processing order(esp when reestablish a connection). Well, I'm still curious about how the interleave happened. Any thoughts? > Blocks removed by thousands due to falsely detected zombie storages > --- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246990#comment-15246990 ] Lin Yiqun commented on HDFS-10275: -- Thanks [~walter.k.su] for commit! > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.7.3 > > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value
[ https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246988#comment-15246988 ] Lin Yiqun commented on HDFS-10302: -- Thanks [~kihwal] for quick review and commit! > BlockPlacementPolicyDefault should use default replication considerload value > - > > Key: HDFS-10302 > URL: https://issues.apache.org/jira/browse/HDFS-10302 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-10302.001.patch > > > Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value > {{true}} as the replication considerload default value rather than using the > existed string constant value > {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}. > {code} > @Override > public void initialize(Configuration conf, FSClusterStats stats, > NetworkTopology clusterMap, > Host2NodesMap host2datanodeMap) { > this.considerLoad = conf.getBoolean( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true); > this.considerLoadFactor = conf.getDouble( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR, > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT); > this.stats = stats; > this.clusterMap = clusterMap; > this.host2datanodeMap = host2datanodeMap; > this.heartbeatInterval = conf.getLong( > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000; > this.tolerateHeartbeatMultiplier = conf.getInt( > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY, > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT); > this.staleInterval = conf.getLong( > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT); > this.preferLocalNode = conf.getBoolean( > DFSConfigKeys. > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY, > DFSConfigKeys. > > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT); > } > {code} > And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be > used in any place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver
[ https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246970#comment-15246970 ] Hadoop QA commented on HDFS-10264: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 51s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 53s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 102m 34s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 52s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 217m 56s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | |
[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246908#comment-15246908 ] Brahma Reddy Battula commented on HDFS-10284: - I have one minor nit, I think, now we can move comments before testcase Or change the testcase name itself. Like following.. {code} public void testCheckSafeMode3() { // PENDING_THRESHOLD -> OFF {code} can be {code} // PENDING_THRESHOLD -> OFF public void testCheckSafeMode3() { {code} > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch, > HDFS-10284.002.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246905#comment-15246905 ] Hudson commented on HDFS-10306: --- FAILURE: Integrated in Hadoop-trunk-Commit #9629 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9629/]) HDFS-10306. SafeModeMonitor should not leave safe mode if name system is (jing9: rev be0bce1b7171c49e2dca22f56d4e750e606862fc) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerSafeMode.java > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-10306.000.patch > > > This is a follow-up of [HDFS-10192]. > The {{BlockManagerSafeMode$SafeModeMonitor#canLeave()}} is not checking the > {{namesystem#inTransitionToActive()}}, while it should. According to the fix > of [HDFS-10192], we should add this check to prevent the {{smmthread}} from > calling {{leaveSafeMode()}} too early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-10306: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) I've committed this into trunk and branch-2. Thanks for the fix, [~liuml07]. > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-10306.000.patch > > > This is a follow-up of [HDFS-10192]. > The {{BlockManagerSafeMode$SafeModeMonitor#canLeave()}} is not checking the > {{namesystem#inTransitionToActive()}}, while it should. According to the fix > of [HDFS-10192], we should add this check to prevent the {{smmthread}} from > calling {{leaveSafeMode()}} too early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246891#comment-15246891 ] Brahma Reddy Battula commented on HDFS-10306: - LGTM, [~liuml07] thanks for taking care for this.. > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10306.000.patch > > > This is a follow-up of [HDFS-10192]. > The {{BlockManagerSafeMode$SafeModeMonitor#canLeave()}} is not checking the > {{namesystem#inTransitionToActive()}}, while it should. According to the fix > of [HDFS-10192], we should add this check to prevent the {{smmthread}} from > calling {{leaveSafeMode()}} too early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246886#comment-15246886 ] Jing Zhao commented on HDFS-10306: -- +1. I will commit the patch shortly. > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10306.000.patch > > > This is a follow-up of [HDFS-10192]. > The {{BlockManagerSafeMode$SafeModeMonitor#canLeave()}} is not checking the > {{namesystem#inTransitionToActive()}}, while it should. According to the fix > of [HDFS-10192], we should add this check to prevent the {{smmthread}} from > calling {{leaveSafeMode()}} too early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246844#comment-15246844 ] Mingliang Liu commented on HDFS-10306: -- Test failures are not related. We don't add new test because this is a follow-up of [HDFS-10192], which added two unit test, and [HDFS-10284] refined them. > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10306.000.patch > > > This is a follow-up of [HDFS-10192]. > The {{BlockManagerSafeMode$SafeModeMonitor#canLeave()}} is not checking the > {{namesystem#inTransitionToActive()}}, while it should. According to the fix > of [HDFS-10192], we should add this check to prevent the {{smmthread}} from > calling {{leaveSafeMode()}} too early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10284: - Priority: Major (was: Minor) > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch, > HDFS-10284.002.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10232) Ozone: Make config key naming consistent
[ https://issues.apache.org/jira/browse/HDFS-10232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-10232: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) [~arpitagarwal] Thanks for the review. I have committed this to feature branch > Ozone: Make config key naming consistent > > > Key: HDFS-10232 > URL: https://issues.apache.org/jira/browse/HDFS-10232 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Trivial > Attachments: HDFS-10232-HDFS-7240.001.patch, > HDFS-10232-HDFS-7240.002.patch, HDFS-10232-HDFS-7240.003.patch > > > We seem to use StorageHandler, ozone, Objectstore etc as prefix. We should > pick one -- Ideally ozone and use that consistently as the prefix for the > ozone config management. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots
[ https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246800#comment-15246800 ] Xiao Chen commented on HDFS-8986: - The left test failures are not related (HDFS-10291 fixes TestShortCircuitLocalRead). The checkstyle I think should be left as such to make the code more readable. Appreciate any review comments. Thanks! > Add option to -du to calculate directory space usage excluding snapshots > > > Key: HDFS-8986 > URL: https://issues.apache.org/jira/browse/HDFS-8986 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Gautam Gopalakrishnan >Assignee: Xiao Chen > Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch > > > When running {{hadoop fs -du}} on a snapshotted directory (or one of its > children), the report includes space consumed by blocks that are only present > in the snapshots. This is confusing for end users. > {noformat} > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -createSnapshot /tmp/parent snap1 > Created snapshot /tmp/parent/.snapshot/snap1 > $ hadoop fs -rm -skipTrash /tmp/parent/sub1/* > ... > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -deleteSnapshot /tmp/parent snap1 > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 0 0 /tmp/parent > 0 0 /tmp/parent/sub1 > {noformat} > It would be helpful if we had a flag, say -X, to exclude any snapshot related > disk usage in the output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots
[ https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246786#comment-15246786 ] Hadoop QA commented on HDFS-8986: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 42s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 13s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 52s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 4s {color} | {color:red} root: patch generated 3 new + 166 unchanged - 11 fixed = 169 total (was 177) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 12s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 51s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 40s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} |
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246785#comment-15246785 ] Hadoop QA commented on HDFS-3702: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 22 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 4s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 49s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 47s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 16s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 55s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 50s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 50s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 11s {color} | {color:red} root: patch generated 10 new + 675 unchanged - 7 fixed = 685 total (was 682) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 16s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 44s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 42s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 22s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green}
[jira] [Commented] (HDFS-9869) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-2]
[ https://issues.apache.org/jira/browse/HDFS-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246771#comment-15246771 ] Zhe Zhang commented on HDFS-9869: - Thanks Rakesh for the update. Path LGTM overall. A few remaining minor issues: # Chain deprecation below. Maybe we should combine them? {code} | dfs.replication.pending.timeout.sec | dfs.namenode.replication.pending.timeout-sec | | dfs.namenode.replication.pending.timeout-sec | dfs.namenode.reconstruction.pending.timeout-sec | {code} # As a followup we should also deprecate other replication-related config keys. # Renaming {{getExcessBlocksCount}} to {{getExtraBlocksCount}} doesn't seem necessary. {{FSNamesystem}} is using "excess" anyway. Sorry my previous comment pointed to this one. I just noticed the original name uses "blocks" instead of "replicas". Maybe we can also keep the "excess" in {{excessRedundancyMap}}? This matches better with the getter. +1 after the above are addressed. > Erasure Coding: Rename replication-based names in BlockManager to more > generic [part-2] > --- > > Key: HDFS-9869 > URL: https://issues.apache.org/jira/browse/HDFS-9869 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-9869-001.patch, HDFS-9869-002.patch, > HDFS-9869-003.patch, HDFS-9869-004.patch > > > The idea of this jira is to rename the following entities in BlockManager as, > - {{PendingReplicationBlocks}} to {{PendingReconstructionBlocks}} > - {{excessReplicateMap}} to {{extraRedundancyMap}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.
[ https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246716#comment-15246716 ] Rushabh S Shah commented on HDFS-10305: --- I think it is not a failed operation. > Hdfs audit shouldn't log mkdir operaton if the directory already exists. > > > Key: HDFS-10305 > URL: https://issues.apache.org/jira/browse/HDFS-10305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Minor > > Currently Hdfs audit logs mkdir operation even if the directory already > exists. > This creates confusion while analyzing audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.
[ https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246710#comment-15246710 ] Mingliang Liu commented on HDFS-10305: -- Agreed. > Hdfs audit shouldn't log mkdir operaton if the directory already exists. > > > Key: HDFS-10305 > URL: https://issues.apache.org/jira/browse/HDFS-10305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Minor > > Currently Hdfs audit logs mkdir operation even if the directory already > exists. > This creates confusion while analyzing audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.
[ https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246701#comment-15246701 ] Ravi Prakash commented on HDFS-10305: - I believe the audit log was supposed to capture failed operations as well. I'd be inclined to close this JIRA as WON'T FIX > Hdfs audit shouldn't log mkdir operaton if the directory already exists. > > > Key: HDFS-10305 > URL: https://issues.apache.org/jira/browse/HDFS-10305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Minor > > Currently Hdfs audit logs mkdir operation even if the directory already > exists. > This creates confusion while analyzing audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9530) huge Non-DFS Used in hadoop 2.6.2 & 2.7.1
[ https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246697#comment-15246697 ] Ravi Prakash commented on HDFS-9530: To answer one of my own questions: "Could you please point me to the code where you see this happening?" In 2, Brahma is likely referring to [BlockReceiver:283|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java#L283] -> [ReplicaInPipeline:163|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java#L163] -> [FsVolumeImpl:480|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java#L480] In 3, Brahma is likely referring to [BlockReceiver:956|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java#L956] -> [ReplicaInPipeline:163|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java#L163] -> [FsVolumeImpl:480|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java#L480] > huge Non-DFS Used in hadoop 2.6.2 & 2.7.1 > - > > Key: HDFS-9530 > URL: https://issues.apache.org/jira/browse/HDFS-9530 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Fei Hui > Attachments: HDFS-9530-01.patch > > > i think there are bugs in HDFS > === > here is config > > dfs.datanode.data.dir > > > file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2 > > > here is dfsadmin report > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 238604832768 (222.22 GB) > DFS Remaining: 215772954624 (200.95 GB) > DFS Used: 22831878144 (21.26 GB) > DFS Used%: 9.57% > Under replicated blocks: 4 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > - > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7190958080 (6.70 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72343986176 (67.38 GB) > DFS Used%: 8.96% > DFS Remaining%: 90.14% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:02 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7219073024 (6.72 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72315871232 (67.35 GB) > DFS Used%: 9.00% > DFS Remaining%: 90.11% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 8421847040 (7.84 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 71113097216 (66.23 GB) > DFS Used%: 10.49% > DFS Remaining%: 88.61% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > > when running hive job , dfsadmin report as follows > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 108266011136 (100.83 GB) > DFS Remaining: 80078416384 (74.58 GB) > DFS Used: 28187594752 (26.25 GB) > DFS Used%: 26.04% > Under replicated blocks: 7 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > - > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9015627776 (8.40 GB) > Non DFS Used: 44303742464 (41.26 GB) > DFS Remaining:
[jira] [Commented] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds
[ https://issues.apache.org/jira/browse/HDFS-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246648#comment-15246648 ] Daniel Templeton commented on HDFS-9744: LGTM > TestDirectoryScanner#testThrottling occasionally time out after 300 seconds > --- > > Key: HDFS-9744 > URL: https://issues.apache.org/jira/browse/HDFS-9744 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Lin Yiqun >Priority: Minor > Labels: test > Attachments: HDFS-9744.001.patch > > > I have seen quite a few test failures in TestDirectoryScanner#testThrottling. > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/ > Looking at the log, it does not look like the test got stucked. On my local > machine, this test took 219 seconds. It is likely that this test takes more > than 300 seconds to complete on a busy jenkins slave. I think it is > reasonable to set a longer time out value, or reduce the number of blocks to > reduce the duration of the test. > Error Message > {noformat} > test timed out after 30 milliseconds > {noformat} > Stacktrace > {noformat} > java.lang.Exception: test timed out after 30 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at > org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804) > at > org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423) > at > org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432) > at > org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217) > at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125) > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376) > at > org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108) > at > org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246647#comment-15246647 ] Hadoop QA commented on HDFS-10306: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 58s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 44s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 134m 59s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.TestHFlush | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12799326/HDFS-10306.000.patch | | JIRA Issue | HDFS-10306 | | Optional Tests |
[jira] [Commented] (HDFS-10304) implement moveToLocal or remove it from the usage list
[ https://issues.apache.org/jira/browse/HDFS-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246620#comment-15246620 ] Xiaobing Zhou commented on HDFS-10304: -- [~steve_l] thanks for this checking. How about implementing it by combination of get and rm? > implement moveToLocal or remove it from the usage list > -- > > Key: HDFS-10304 > URL: https://issues.apache.org/jira/browse/HDFS-10304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: scripts >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Xiaobing Zhou >Priority: Minor > > if you get the usage list of {{hdfs dfs}} it tells you of "-moveToLocal". > If you try to use the command, it tells you off "Option '-moveToLocal' is not > implemented yet." > Either the command should be implemented, or it should be removed from the > usage list, as it is not technically a command you can use, except in the > special case of "I want my shell to print "not implemented yet"" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10232) Ozone: Make config key naming consistent
[ https://issues.apache.org/jira/browse/HDFS-10232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246618#comment-15246618 ] Anu Engineer commented on HDFS-10232: - Test failures are not related to this patch. > Ozone: Make config key naming consistent > > > Key: HDFS-10232 > URL: https://issues.apache.org/jira/browse/HDFS-10232 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Trivial > Attachments: HDFS-10232-HDFS-7240.001.patch, > HDFS-10232-HDFS-7240.002.patch, HDFS-10232-HDFS-7240.003.patch > > > We seem to use StorageHandler, ozone, Objectstore etc as prefix. We should > pick one -- Ideally ozone and use that consistently as the prefix for the > ozone config management. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10232) Ozone: Make config key naming consistent
[ https://issues.apache.org/jira/browse/HDFS-10232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246613#comment-15246613 ] Hadoop QA commented on HDFS-10232: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 11 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 46s {color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s {color} | {color:green} HDFS-7240 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s {color} | {color:green} HDFS-7240 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s {color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s {color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} HDFS-7240 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 45s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in HDFS-7240 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s {color} | {color:green} HDFS-7240 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 5s {color} | {color:green} HDFS-7240 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 48s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 47s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 3s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 197m 2s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.namenode.TestEditLog | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | |
[jira] [Commented] (HDFS-3743) QJM: improve formatting behavior for JNs
[ https://issues.apache.org/jira/browse/HDFS-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246601#comment-15246601 ] Jian Fang commented on HDFS-3743: - More likely the above issue was caused by some race condition in restarting name nodes and journal nodes instead of my code changes. Will create a separate JIRA to add the "-newEditsOnly" option to initializeSharedEdits and link it here later. > QJM: improve formatting behavior for JNs > > > Key: HDFS-3743 > URL: https://issues.apache.org/jira/browse/HDFS-3743 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: QuorumJournalManager (HDFS-3077) >Reporter: Todd Lipcon > > Currently, the JournalNodes automatically format themselves when a new writer > takes over, if they don't have any data for that namespace. However, this has > a few problems: > 1) if the administrator accidentally points a new NN at the wrong quorum (eg > corresponding to another cluster), it will auto-format a directory on those > nodes. This doesn't cause any data loss, but would be better to bail out with > an error indicating that they need to be formatted. > 2) if a journal node crashes and needs to be reformatted, it should be able > to re-join the cluster and start storing new segments without having to fail > over to a new NN. > 3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes > undone), and the user starts the NN, it should fail to start, because it may > end up missing edits. If it auto-formats in this case, the user might have > silent "rollback" of the most recent edits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver
[ https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246596#comment-15246596 ] Xiaobing Zhou commented on HDFS-10264: -- [~boky01] and [~shv] thanks for the comments. v001 used SLF4J logging. > Logging improvements in FSImageFormatProtobuf.Saver > --- > > Key: HDFS-10264 > URL: https://issues.apache.org/jira/browse/HDFS-10264 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Xiaobing Zhou > Labels: newbie > Attachments: HDFS-10264.000.patch, HDFS-10264.001.patch > > > There are two missing LOG messages in {{FSImageFormat.Saver}} that are > missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of > fsimage saving. Would be good to have them logged for protobuf images as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver
[ https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246587#comment-15246587 ] Andras Bokor commented on HDFS-10264: - [~xiaobingo] Please check the comments above. SLF4J is preferred. > Logging improvements in FSImageFormatProtobuf.Saver > --- > > Key: HDFS-10264 > URL: https://issues.apache.org/jira/browse/HDFS-10264 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Xiaobing Zhou > Labels: newbie > Attachments: HDFS-10264.000.patch, HDFS-10264.001.patch > > > There are two missing LOG messages in {{FSImageFormat.Saver}} that are > missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of > fsimage saving. Would be good to have them logged for protobuf images as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver
[ https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HDFS-10264: - Attachment: HDFS-10264.001.patch > Logging improvements in FSImageFormatProtobuf.Saver > --- > > Key: HDFS-10264 > URL: https://issues.apache.org/jira/browse/HDFS-10264 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Xiaobing Zhou > Labels: newbie > Attachments: HDFS-10264.000.patch, HDFS-10264.001.patch > > > There are two missing LOG messages in {{FSImageFormat.Saver}} that are > missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of > fsimage saving. Would be good to have them logged for protobuf images as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246571#comment-15246571 ] Konstantin Shvachko commented on HDFS-10301: Hey Daryn, not sure how HDFS-9198 eliminates it from occurring. DataNodes are still waiting for NN to process each BR, so they can timeout and send the same block report multiple times. On the NN side, BR ops processing is multi-threaded, so it can still interleave processing storages from different reports. Could you please clarify, what am I missing? > Blocks removed by thousands due to falsely detected zombie storages > --- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver
[ https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246550#comment-15246550 ] Xiaobing Zhou commented on HDFS-10264: -- I posted the patch v000, please kindly review, thanks. > Logging improvements in FSImageFormatProtobuf.Saver > --- > > Key: HDFS-10264 > URL: https://issues.apache.org/jira/browse/HDFS-10264 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Xiaobing Zhou > Labels: newbie > Attachments: HDFS-10264.000.patch > > > There are two missing LOG messages in {{FSImageFormat.Saver}} that are > missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of > fsimage saving. Would be good to have them logged for protobuf images as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver
[ https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HDFS-10264: - Attachment: HDFS-10264.000.patch > Logging improvements in FSImageFormatProtobuf.Saver > --- > > Key: HDFS-10264 > URL: https://issues.apache.org/jira/browse/HDFS-10264 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Xiaobing Zhou > Labels: newbie > Attachments: HDFS-10264.000.patch > > > There are two missing LOG messages in {{FSImageFormat.Saver}} that are > missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of > fsimage saving. Would be good to have them logged for protobuf images as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver
[ https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HDFS-10264: - Status: Patch Available (was: Open) > Logging improvements in FSImageFormatProtobuf.Saver > --- > > Key: HDFS-10264 > URL: https://issues.apache.org/jira/browse/HDFS-10264 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Xiaobing Zhou > Labels: newbie > Attachments: HDFS-10264.000.patch > > > There are two missing LOG messages in {{FSImageFormat.Saver}} that are > missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of > fsimage saving. Would be good to have them logged for protobuf images as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
[ https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246526#comment-15246526 ] Hudson commented on HDFS-10265: --- FAILURE: Integrated in Hadoop-trunk-Commit #9628 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9628/]) HDFS-10265. OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has (cmccabe: rev cb3ca460efb97be8c031bdb14bb7705cc25f2117) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java > OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag > - > > Key: HDFS-10265 > URL: https://issues.apache.org/jira/browse/HDFS-10265 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.4.1, 2.7.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Minor > Labels: patch > Fix For: 2.8.0 > > Attachments: HDFS-10265-001.patch, HDFS-10265-002.patch > > > I use OEV tool to convert editlog to xml file, then convert the xml file back > to binary editslog file(so that low version NameNode can load edits that > generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK > tag, the OEV tool doesn't handle the case and exits with InvalidXmlException. > Here is the stack: > {code} > fromXml error decoding opcode null > {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"}, > {"-2"}, {}, > {"3875711"}} > Encountered exception. Exiting: no entry found for BLOCK > org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for > BLOCK > at > org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942) > ... > {code} > Here is part of the xml file: > {code} > > OP_UPDATE_BLOCKS > > 3875711 > > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 > > -2 > > > {code} > I tracked the NN's log and found those operation: > 0. The file > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is > very small and contains only one block. > 1. Client ask NN to add block to the file. > 2. Client failed to write to DN and asked NameNode to abandon block. > 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog > Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags. > In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
[ https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10265: Resolution: Fixed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Status: Resolved (was: Patch Available) > OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag > - > > Key: HDFS-10265 > URL: https://issues.apache.org/jira/browse/HDFS-10265 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.4.1, 2.7.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Minor > Labels: patch > Fix For: 2.8.0 > > Attachments: HDFS-10265-001.patch, HDFS-10265-002.patch > > > I use OEV tool to convert editlog to xml file, then convert the xml file back > to binary editslog file(so that low version NameNode can load edits that > generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK > tag, the OEV tool doesn't handle the case and exits with InvalidXmlException. > Here is the stack: > {code} > fromXml error decoding opcode null > {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"}, > {"-2"}, {}, > {"3875711"}} > Encountered exception. Exiting: no entry found for BLOCK > org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for > BLOCK > at > org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942) > ... > {code} > Here is part of the xml file: > {code} > > OP_UPDATE_BLOCKS > > 3875711 > > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 > > -2 > > > {code} > I tracked the NN's log and found those operation: > 0. The file > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is > very small and contains only one block. > 1. Client ask NN to add block to the file. > 2. Client failed to write to DN and asked NameNode to abandon block. > 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog > Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags. > In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
[ https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246503#comment-15246503 ] Colin Patrick McCabe commented on HDFS-10265: - +1. Thanks, [~wanchang]. > OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag > - > > Key: HDFS-10265 > URL: https://issues.apache.org/jira/browse/HDFS-10265 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.4.1, 2.7.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Minor > Labels: patch > Attachments: HDFS-10265-001.patch, HDFS-10265-002.patch > > > I use OEV tool to convert editlog to xml file, then convert the xml file back > to binary editslog file(so that low version NameNode can load edits that > generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK > tag, the OEV tool doesn't handle the case and exits with InvalidXmlException. > Here is the stack: > {code} > fromXml error decoding opcode null > {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"}, > {"-2"}, {}, > {"3875711"}} > Encountered exception. Exiting: no entry found for BLOCK > org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for > BLOCK > at > org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942) > ... > {code} > Here is part of the xml file: > {code} > > OP_UPDATE_BLOCKS > > 3875711 > > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 > > -2 > > > {code} > I tracked the NN's log and found those operation: > 0. The file > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is > very small and contains only one block. > 1. Client ask NN to add block to the file. > 2. Client failed to write to DN and asked NameNode to abandon block. > 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog > Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags. > In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver
[ https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246500#comment-15246500 ] Konstantin Shvachko commented on HDFS-10264: Makes sense. Good suggestions, Andras. > Logging improvements in FSImageFormatProtobuf.Saver > --- > > Key: HDFS-10264 > URL: https://issues.apache.org/jira/browse/HDFS-10264 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Xiaobing Zhou > Labels: newbie > > There are two missing LOG messages in {{FSImageFormat.Saver}} that are > missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of > fsimage saving. Would be good to have them logged for protobuf images as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246466#comment-15246466 ] Hadoop QA commented on HDFS-9958: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 0s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 20s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 137m 12s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.TestFileCorruption | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.TestFileCorruption | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12799304/HDFS-9958.002.patch | | JIRA Issue | HDFS-9958 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 6e7a2dcae2f5
[jira] [Updated] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-3702: Attachment: HDFS-3702.012.patch bq. BTW, we should check if excludedNodes.contains(writer) is already true; otherwise, fallback does not help. Fixed in the newly updated patch. [~szetszwo] What do you think about [~cmccabe] and [~stack]'s suggestions? Would that works for you? Thanks! > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots
[ https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-8986: Attachment: HDFS-8986.02.patch Patch 2 fixes all reported errors. The 2 added tests in {{DFSShell}} can pass locally, not sure why it failed on jenkins. > Add option to -du to calculate directory space usage excluding snapshots > > > Key: HDFS-8986 > URL: https://issues.apache.org/jira/browse/HDFS-8986 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Gautam Gopalakrishnan >Assignee: Xiao Chen > Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch > > > When running {{hadoop fs -du}} on a snapshotted directory (or one of its > children), the report includes space consumed by blocks that are only present > in the snapshots. This is confusing for end users. > {noformat} > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -createSnapshot /tmp/parent snap1 > Created snapshot /tmp/parent/.snapshot/snap1 > $ hadoop fs -rm -skipTrash /tmp/parent/sub1/* > ... > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -deleteSnapshot /tmp/parent snap1 > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 0 0 /tmp/parent > 0 0 /tmp/parent/sub1 > {noformat} > It would be helpful if we had a flag, say -X, to exclude any snapshot related > disk usage in the output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9016) Display upgrade domain information in fsck
[ https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246375#comment-15246375 ] Allen Wittenauer commented on HDFS-9016: fsck has already shipped in a previous release of Hadoop. Changing it's output is not a compatible change in the entirety of branch-2. > Display upgrade domain information in fsck > -- > > Key: HDFS-9016 > URL: https://issues.apache.org/jira/browse/HDFS-9016 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9016.patch > > > This will make it easy for people to use fsck to check block placement when > upgrade domain is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10306: - Description: This is a follow-up of [HDFS-10192]. The {{BlockManagerSafeMode$SafeModeMonitor#canLeave()}} is not checking the {{namesystem#inTransitionToActive()}}, while it should. According to the fix of [HDFS-10192], we should add this check to prevent the {{smmthread}} from calling {{leaveSafeMode()}} too early. was: This is a follow-up of [HDFS-10192]. The {{BlockManagerSafeMode$SafeModeMonitor#canLeave90}} is not checking the {{namesystem#inTransitionToActive()}}, while it should. According to the fix of [HDFS-10192], we should add this check to prevent the {{smmthread}} from calling {{leaveSafeMode()}} too early. > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10306.000.patch > > > This is a follow-up of [HDFS-10192]. > The {{BlockManagerSafeMode$SafeModeMonitor#canLeave()}} is not checking the > {{namesystem#inTransitionToActive()}}, while it should. According to the fix > of [HDFS-10192], we should add this check to prevent the {{smmthread}} from > calling {{leaveSafeMode()}} too early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10306: - Status: Patch Available (was: Open) > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10306.000.patch > > > This is a follow-up of [HDFS-10192]. > The {{BlockManagerSafeMode$SafeModeMonitor#canLeave90}} is not checking the > {{namesystem#inTransitionToActive()}}, while it should. According to the fix > of [HDFS-10192], we should add this check to prevent the {{smmthread}} from > calling {{leaveSafeMode()}} too early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10306: - Attachment: HDFS-10306.000.patch Thanks [~walter.k.su] for suggesting separating this code out of [HDFS-10284], which was to address {{o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode}} unit test intermittent failure. The v0 patch was from the v1 patch of [HDFS-10284]. Please review. > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10306.000.patch > > > This is a follow-up of [HDFS-10192]. > The {{BlockManagerSafeMode$SafeModeMonitor#canLeave90}} is not checking the > {{namesystem#inTransitionToActive()}}, while it should. According to the fix > of [HDFS-10192], we should add this check to prevent the {{smmthread}} from > calling {{leaveSafeMode()}} too early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10284: - Attachment: HDFS-10284.002.patch Thank you [~walter.k.su] and [~vinayrpet] for your kind review. The v2 patch is to remove the changes of {{namesystem#inTransitionToActive()}} to a separate patch, as [~walter.k.su] suggested. I created jira [HDFS-10306] for this. > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch, > HDFS-10284.002.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10306: - Description: This is a follow-up of [HDFS-10192]. The {{BlockManagerSafeMode$SafeModeMonitor#canLeave90}} is not checking the {{namesystem#inTransitionToActive()}}, while it should. According to the fix of [HDFS-10192], we should add this check to prevent the {{smmthread}} from calling {{leaveSafeMode()}} too early. was: Scenario: === write some blocks wait till roll edits happen Stop SNN Delete some blocks in ANN, wait till the blocks are deleted in DN also. restart the SNN and Wait till block reports come from datanode to SNN Kill ANN then make SNN to Active. > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > This is a follow-up of [HDFS-10192]. > The {{BlockManagerSafeMode$SafeModeMonitor#canLeave90}} is not checking the > {{namesystem#inTransitionToActive()}}, while it should. According to the fix > of [HDFS-10192], we should add this check to prevent the {{smmthread}} from > calling {{leaveSafeMode()}} too early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10306: - Fix Version/s: (was: 2.9.0) > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > Scenario: > === > write some blocks > wait till roll edits happen > Stop SNN > Delete some blocks in ANN, wait till the blocks are deleted in DN also. > restart the SNN and Wait till block reports come from datanode to SNN > Kill ANN then make SNN to Active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10306: - Hadoop Flags: (was: Reviewed) > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.9.0 > > > Scenario: > === > write some blocks > wait till roll edits happen > Stop SNN > Delete some blocks in ANN, wait till the blocks are deleted in DN also. > restart the SNN and Wait till block reports come from datanode to SNN > Kill ANN then make SNN to Active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
Mingliang Liu created HDFS-10306: Summary: SafeModeMonitor should not leave safe mode if name system is starting active service Key: HDFS-10306 URL: https://issues.apache.org/jira/browse/HDFS-10306 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Mingliang Liu Assignee: Brahma Reddy Battula Fix For: 2.9.0 Scenario: === write some blocks wait till roll edits happen Stop SNN Delete some blocks in ANN, wait till the blocks are deleted in DN also. restart the SNN and Wait till block reports come from datanode to SNN Kill ANN then make SNN to Active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-10306) SafeModeMonitor should not leave safe mode if name system is starting active service
[ https://issues.apache.org/jira/browse/HDFS-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu reassigned HDFS-10306: Assignee: Mingliang Liu (was: Brahma Reddy Battula) > SafeModeMonitor should not leave safe mode if name system is starting active > service > > > Key: HDFS-10306 > URL: https://issues.apache.org/jira/browse/HDFS-10306 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.9.0 > > > Scenario: > === > write some blocks > wait till roll edits happen > Stop SNN > Delete some blocks in ANN, wait till the blocks are deleted in DN also. > restart the SNN and Wait till block reports come from datanode to SNN > Kill ANN then make SNN to Active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9016) Display upgrade domain information in fsck
[ https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246308#comment-15246308 ] Ming Ma commented on HDFS-9016: --- Thanks, [~eddyxu]! It shouldn't change fsck's output format given upgrade domain isn't defined by default and it can't be defined until HDFS-9005 was available. Given 2.8 hasn't been released yet, it seems ok as long as this jira is added to 2.8. [~aw], [~andrew.wang], thoughts? > Display upgrade domain information in fsck > -- > > Key: HDFS-9016 > URL: https://issues.apache.org/jira/browse/HDFS-9016 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9016.patch > > > This will make it easy for people to use fsck to check block placement when > upgrade domain is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10299) libhdfs++: File length doesn't always count the last block if it's being written to
[ https://issues.apache.org/jira/browse/HDFS-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-10299: --- Resolution: Fixed Status: Resolved (was: Patch Available) > libhdfs++: File length doesn't always count the last block if it's being > written to > --- > > Key: HDFS-10299 > URL: https://issues.apache.org/jira/browse/HDFS-10299 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Xiaowei Zhu > Attachments: HDFS-10299.HDFS-8707.000.patch > > > It looks like we aren't factoring in the last block of files that are being > written to or haven't been closed yet into the length of the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10232) Ozone: Make config key naming consistent
[ https://issues.apache.org/jira/browse/HDFS-10232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246272#comment-15246272 ] Arpit Agarwal commented on HDFS-10232: -- +1 thanks [~anu]. > Ozone: Make config key naming consistent > > > Key: HDFS-10232 > URL: https://issues.apache.org/jira/browse/HDFS-10232 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Trivial > Attachments: HDFS-10232-HDFS-7240.001.patch, > HDFS-10232-HDFS-7240.002.patch, HDFS-10232-HDFS-7240.003.patch > > > We seem to use StorageHandler, ozone, Objectstore etc as prefix. We should > pick one -- Ideally ozone and use that consistently as the prefix for the > ozone config management. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10232) Ozone: Make config key naming consistent
[ https://issues.apache.org/jira/browse/HDFS-10232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-10232: Attachment: HDFS-10232-HDFS-7240.003.patch Missed a comment from Arpit in earlier patch. Removed DFS prefix from Ozone Keys. > Ozone: Make config key naming consistent > > > Key: HDFS-10232 > URL: https://issues.apache.org/jira/browse/HDFS-10232 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Trivial > Attachments: HDFS-10232-HDFS-7240.001.patch, > HDFS-10232-HDFS-7240.002.patch, HDFS-10232-HDFS-7240.003.patch > > > We seem to use StorageHandler, ozone, Objectstore etc as prefix. We should > pick one -- Ideally ozone and use that consistently as the prefix for the > ozone config management. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9016) Display upgrade domain information in fsck
[ https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246223#comment-15246223 ] Allen Wittenauer commented on HDFS-9016: If it changes the output in any way/shape/form, it's not backward compatible. See http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Command_Line_Interface_CLI . > Display upgrade domain information in fsck > -- > > Key: HDFS-9016 > URL: https://issues.apache.org/jira/browse/HDFS-9016 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9016.patch > > > This will make it easy for people to use fsck to check block placement when > upgrade domain is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated HDFS-9958: -- Attachment: HDFS-9958.002.patch Updating patch per comments. Added test is now reliable. > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9016) Display upgrade domain information in fsck
[ https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246203#comment-15246203 ] Lei (Eddy) Xu commented on HDFS-9016: - Hi, [~mingma] Thanks for the work. I +1 for the code. But it'd be better for [~aw] or [~andrew.wang] to comment the compatibility. > Display upgrade domain information in fsck > -- > > Key: HDFS-9016 > URL: https://issues.apache.org/jira/browse/HDFS-9016 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9016.patch > > > This will make it easy for people to use fsck to check block placement when > upgrade domain is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-10304) implement moveToLocal or remove it from the usage list
[ https://issues.apache.org/jira/browse/HDFS-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou reassigned HDFS-10304: Assignee: Xiaobing Zhou > implement moveToLocal or remove it from the usage list > -- > > Key: HDFS-10304 > URL: https://issues.apache.org/jira/browse/HDFS-10304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: scripts >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Xiaobing Zhou >Priority: Minor > > if you get the usage list of {{hdfs dfs}} it tells you of "-moveToLocal". > If you try to use the command, it tells you off "Option '-moveToLocal' is not > implemented yet." > Either the command should be implemented, or it should be removed from the > usage list, as it is not technically a command you can use, except in the > special case of "I want my shell to print "not implemented yet"" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9530) huge Non-DFS Used in hadoop 2.6.2 & 2.7.1
[ https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246179#comment-15246179 ] Ravi Prakash commented on HDFS-9530: bq. 1. Reservation happens only when the block is being received using BlockReceiver. No other places reservation happens, so no need to release as well. Thanks for reminding me Brahma! Do you think we should change {{reservedForReplicas}} when a datanode is started up and an older RBW replica is recovered? Specifically [BlockPoolSlice.getVolumeMap|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java#L361] {{addToReplicasMap(volumeMap, rbwDir, lazyWriteReplicaMap, false);}} . Also it seems to me, since we aren't calling {{reserveSpaceForReplica}} in BlockReceiver but instead at a lower level, we will have to worry about calling {{releaseReservedSpace}} at that lower level. {quote}2. BlockReceiver constructor have a try-catch block where it will release all the bytes reserved, if there is any exceptions after reserving. 3. BlockReceiver#receiveBlock() have the try-catch block where it will release all the bytes reserved if there is any exceptions during the receiving process.{quote} Could you please point me to the code where you see this happening? I mean specific instances of {{FsVolumeImpl.releaseReservedSpace}} being called with the stack trace. bq. Only place left is in DataXceiver#writeBlock(), exception can happen after creation of BlockReceiver and before calling BlockReceiver#receiveBlock(), if failed to connect to Mirror nodes. Do you mean to imply that the places I found in [this comment|https://issues.apache.org/jira/browse/HDFS-9530?focusedCommentId=15231164=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15231164] need not call {{reserveSpaceForReplica}} / {{releaseReservedSpace}} ? > huge Non-DFS Used in hadoop 2.6.2 & 2.7.1 > - > > Key: HDFS-9530 > URL: https://issues.apache.org/jira/browse/HDFS-9530 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Fei Hui > Attachments: HDFS-9530-01.patch > > > i think there are bugs in HDFS > === > here is config > > dfs.datanode.data.dir > > > file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2 > > > here is dfsadmin report > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 238604832768 (222.22 GB) > DFS Remaining: 215772954624 (200.95 GB) > DFS Used: 22831878144 (21.26 GB) > DFS Used%: 9.57% > Under replicated blocks: 4 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > - > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7190958080 (6.70 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72343986176 (67.38 GB) > DFS Used%: 8.96% > DFS Remaining%: 90.14% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:02 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7219073024 (6.72 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72315871232 (67.35 GB) > DFS Used%: 9.00% > DFS Remaining%: 90.11% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 8421847040 (7.84 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 71113097216 (66.23 GB) > DFS Used%: 10.49% > DFS Remaining%: 88.61% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > > when running hive job , dfsadmin report as follows > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity:
[jira] [Commented] (HDFS-10256) Use GenericTestUtils.getTestDir method in tests for temporary directories
[ https://issues.apache.org/jira/browse/HDFS-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246137#comment-15246137 ] Kihwal Lee commented on HDFS-10256: --- The entire base dir is supposed to be wiped out by the shutdown hook. The space won't get cleaned up between test cases running on the same jvm, but I thought the space usage increase would be negligible. I will check it out further. > Use GenericTestUtils.getTestDir method in tests for temporary directories > - > > Key: HDFS-10256 > URL: https://issues.apache.org/jira/browse/HDFS-10256 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build, test >Reporter: Vinayakumar B >Assignee: Vinayakumar B > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.
[ https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-10305: -- Description: Currently Hdfs audit logs mkdir operation even if the directory already exists. This creates confusion while analyzing audit logs. was: Currently Hdfs audit logs mkdir operation if the directory already exists. This creates confusion while analyzing audit logs. > Hdfs audit shouldn't log mkdir operaton if the directory already exists. > > > Key: HDFS-10305 > URL: https://issues.apache.org/jira/browse/HDFS-10305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Minor > > Currently Hdfs audit logs mkdir operation even if the directory already > exists. > This creates confusion while analyzing audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.
[ https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-10305: -- Summary: Hdfs audit shouldn't log mkdir operaton if the directory already exists. (was: Hdfs audit shouldn't log if the directory already exists.) > Hdfs audit shouldn't log mkdir operaton if the directory already exists. > > > Key: HDFS-10305 > URL: https://issues.apache.org/jira/browse/HDFS-10305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Minor > > Hdfs audit logs mkdir operation if the directory already exists. > This creates confusion while analyzing audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.
[ https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-10305: -- Description: Currently Hdfs audit logs mkdir operation if the directory already exists. This creates confusion while analyzing audit logs. was: Hdfs audit logs mkdir operation if the directory already exists. This creates confusion while analyzing audit logs. > Hdfs audit shouldn't log mkdir operaton if the directory already exists. > > > Key: HDFS-10305 > URL: https://issues.apache.org/jira/browse/HDFS-10305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Minor > > Currently Hdfs audit logs mkdir operation if the directory already exists. > This creates confusion while analyzing audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10305) Hdfs audit shouldn't log if the directory already exists.
Rushabh S Shah created HDFS-10305: - Summary: Hdfs audit shouldn't log if the directory already exists. Key: HDFS-10305 URL: https://issues.apache.org/jira/browse/HDFS-10305 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Rushabh S Shah Assignee: Rushabh S Shah Priority: Minor Hdfs audit logs mkdir operation if the directory already exists. This creates confusion while analyzing audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
[ https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246061#comment-15246061 ] Hadoop QA commented on HDFS-10265: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 47s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 314 unchanged - 1 fixed = 315 total (was 315) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 58s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 55s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 146m 5s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | |
[jira] [Commented] (HDFS-10256) Use GenericTestUtils.getTestDir method in tests for temporary directories
[ https://issues.apache.org/jira/browse/HDFS-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246047#comment-15246047 ] Vinayakumar B commented on HDFS-10256: -- bq. Can we actually make sure each MiniDFSCluster gets a unique base directory? v2 patch by [~ste...@apache.org] in HADOOP-12984 had actually done this. But as mentioned by [~cnauroth] [here|https://issues.apache.org/jira/browse/HADOOP-12984?focusedCommentId=14969932=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14969932] in HADOOP-12984 it had a side effect of using lot of disk space at the end of complete test run. Do you think cleanup not happening properly? > Use GenericTestUtils.getTestDir method in tests for temporary directories > - > > Key: HDFS-10256 > URL: https://issues.apache.org/jira/browse/HDFS-10256 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build, test >Reporter: Vinayakumar B >Assignee: Vinayakumar B > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10304) implement moveToLocal or remove it from the usage list
[ https://issues.apache.org/jira/browse/HDFS-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246040#comment-15246040 ] Steve Loughran commented on HDFS-10304: --- {code} $ hdfs dfs Usage: hadoop fs [generic options] [-appendToFile ... ] [-cat [-ignoreCrc] ...] [-checksum ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R]PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] [-l] ... ] [-copyToLocal [-p] [-ignoreCrc] [-crc] ... ] [-count [-q] [-h] [-v] [-t []] ...] [-cp [-f] [-p | -p[topax]] ... ] [-createSnapshot []] [-deleteSnapshot ] [-df [-h] [ ...]] [-du [-s] [-h] ...] [-expunge] [-find ... ...] [-get [-p] [-ignoreCrc] [-crc] ... ] [-getfacl [-R] ] [-getfattr [-R] {-n name | -d} [-e en] ] [-getmerge [-nl] ] [-help [cmd ...]] [-ls [-d] [-h] [-R] [ ...]] [-mkdir [-p] ...] [-moveFromLocal ... ] [-moveToLocal ] [-mv ... ] [-put [-f] [-p] [-l] ... ] [-renameSnapshot ] [-rm [-f] [-r|-R] [-skipTrash] [-safely] ...] [-rmdir [--ignore-fail-on-non-empty] ...] [-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set ]] [-setfattr {-n name [-v value] | -x name} ] [-setrep [-R] [-w] ...] [-stat [format] ...] [-tail [-f] ] [-test -[defsz] ] [-text [-ignoreCrc] ...] [-touchz ...] [-truncate [-w] ...] [-usage [cmd ...]] Generic options supported are -conf specify an application configuration file -D
[jira] [Created] (HDFS-10304) implement moveToLocal or remove it from the usage list
Steve Loughran created HDFS-10304: - Summary: implement moveToLocal or remove it from the usage list Key: HDFS-10304 URL: https://issues.apache.org/jira/browse/HDFS-10304 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Affects Versions: 2.8.0 Reporter: Steve Loughran Priority: Minor if you get the usage list of {{hdfs dfs}} it tells you of "-moveToLocal". If you try to use the command, it tells you off "Option '-moveToLocal' is not implemented yet." Either the command should be implemented, or it should be removed from the usage list, as it is not technically a command you can use, except in the special case of "I want my shell to print "not implemented yet"" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246011#comment-15246011 ] Anu Engineer commented on HDFS-9543: [~eddyxu] Thank you for the code review comments. Please see some of my thoughts on your suggestions. bq.Could you put getNextBlock() logic into a separate Iterator, and make it Closable, which will include getBlockToCopy(), openPoolIters(), getNextBlock(), closePoolIters(). There are a few draw backs of separating them into different functions. I thought that this is an excellent idea, and I explored this path of code organization. But I got stuck in an intractable problem which makes me want to abandon this path. Please do let me know if you have suggestions on how you think I can solve this. In supporting an Iterator interface, we have to support hasNext. In our case it is a scan of the block list and finding a block that is small than the required move size. There are 2 ways to do this, look for the block and report true if found, but there is no gurantee that it will indeed be returned in the next() call since these blocks can go away underneath us. So a common pattern of iterator code becomes complex to write -- while(hasNext()) next() -- pattern now needs to worry next() failing even when hasNext has been successful. We can keep a pointer to the found block in memory, and return that in the next call, but that means that we have to do some unnecessary block state management in the iterator. I ended up writing all this and found that code is getting more complex instead of simpler and has kind of decided to abandon this approach. bq. 1) The states (i.e. poolIndex,) are stored outside these functions, the caller needs maintain these states. These are part of BlockMover class, and copyblocks in the only call made by other classes. So it is not visible to caller at all. bq. 2) poolIndex is never initialized and is not able be reset. PoolIndex is a index to a circular array, if you like I can initialize this to 0, but in most cases we just move to next block pool and get the next block. Before each block fetch we init the count variable so that we know if we have visited all the block pools. In other words, users should not be able to see this, nor need to reset this variable. bq. Please always log the IOEs. And I think it is better to throw IOE here as well as many other places. I will log a debug trace here. The reason why we are not throwing is because it is possible that we might encounter a damaged block when we try to move large number of blocks from one volume to another. Instead of failing or aborting the action we keep a count of errors we have encountered. We will just ignore that block and continue copy, until we hit max_error_counts. For each move -- that is a source disk, destination disk, bytes to move -- you can specify the max error count. Hence we are not throwing but keeping track of the error count. bq. Can it be a private static List openPoolIters() ? Since we are not doing the iterator interface, I am skipping this too. bq. In a few such places, should we actually break the while loop? Wouldn't continue here just generate a lot of LOGS and spend CPU cycles? You are right and I did think of writing a break here. The reason that I chose continue over break was this. It is easier to reason about a loop if it has only one exit point. With break, you have to reason about all exit points. This loop is a pretty complicated one since it can exit in many ways. Here are some : # when we meet the maximum error count. # if we have reach close enough to move bytes -- say close to 10% of the target # if we get an interrupt call from the client. # if we get a shutdown call. # if we are not able to find any blocks to move. # if we are out of destination space. so instead of making people reason about each of these, we set exit flag and loop back up. Since the while will turn to false, it will have one single exit, and will exit without any extra logging. bq. Why do you need to change float to double. In this case, wouldn't float good enough ? This is a stylistic change, Java docs recommend double as the default choice over float. Hence this fix. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10296) FileContext.getDelegationTokens() fails to obtain KMS delegation token
[ https://issues.apache.org/jira/browse/HDFS-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246008#comment-15246008 ] Wei-Chiu Chuang commented on HDFS-10296: Thanks for providing a code snippet to demonstrate the issue, Andreas. Please note that in your case, if default FS is a local file system, it will not have delegation tokens. While you reported this on a CDH Hadoop, this same behavior also holds for Apache Hadoop. > FileContext.getDelegationTokens() fails to obtain KMS delegation token > -- > > Key: HDFS-10296 > URL: https://issues.apache.org/jira/browse/HDFS-10296 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption >Affects Versions: 2.6.0 > Environment: CDH 5.6 with a Java KMS >Reporter: Andreas Neumann > > This little program demonstrates the problem: With FileSystem, we can get > both the HDFS and the kms-dt token, whereas with FileContext, we can only > obtain the HDFS delegation token. > {code} > public class SimpleTest { > public static void main(String[] args) throws IOException { > YarnConfiguration hConf = new YarnConfiguration(); > String renewer = "renewer"; > FileContext fc = FileContext.getFileContext(hConf); > Listtokens = fc.getDelegationTokens(new Path("/"), renewer); > for (Token token : tokens) { > System.out.println("Token from FC: " + token); > } > FileSystem fs = FileSystem.get(hConf); > for (Token token : fs.addDelegationTokens(renewer, new Credentials())) > { > System.out.println("Token from FS: " + token); > } > } > } > {code} > Sample output (host/user name x'ed out): > {noformat} > Token from FC: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: > (HDFS_DELEGATION_TOKEN token 49 for xxx) > Token from FS: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: > (HDFS_DELEGATION_TOKEN token 50 for xxx) > Token from FS: Kind: kms-dt, Service: xx.xx.xx.xx:16000, Ident: 00 04 63 64 > 61 70 07 72 65 6e 65 77 65 72 00 8a 01 54 16 96 c2 95 8a 01 54 3a a3 46 95 0e > 02 > {noformat} > Apparently FileContext does not return the KMS token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245867#comment-15245867 ] Daryn Sharp commented on HDFS-10301: Enabling HDFS-9198 will fifo process BRs. It doesn't solve this implementation bug but virtually eliminates it from occurring. > Blocks removed by thousands due to falsely detected zombie storages > --- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
[ https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Chang updated HDFS-10265: - Status: Open (was: Patch Available) > OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag > - > > Key: HDFS-10265 > URL: https://issues.apache.org/jira/browse/HDFS-10265 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1, 2.4.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Minor > Labels: patch > Attachments: HDFS-10265-001.patch > > > I use OEV tool to convert editlog to xml file, then convert the xml file back > to binary editslog file(so that low version NameNode can load edits that > generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK > tag, the OEV tool doesn't handle the case and exits with InvalidXmlException. > Here is the stack: > {code} > fromXml error decoding opcode null > {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"}, > {"-2"}, {}, > {"3875711"}} > Encountered exception. Exiting: no entry found for BLOCK > org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for > BLOCK > at > org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942) > ... > {code} > Here is part of the xml file: > {code} > > OP_UPDATE_BLOCKS > > 3875711 > > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 > > -2 > > > {code} > I tracked the NN's log and found those operation: > 0. The file > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is > very small and contains only one block. > 1. Client ask NN to add block to the file. > 2. Client failed to write to DN and asked NameNode to abandon block. > 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog > Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags. > In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
[ https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Chang updated HDFS-10265: - Status: Patch Available (was: Open) submit patch > OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag > - > > Key: HDFS-10265 > URL: https://issues.apache.org/jira/browse/HDFS-10265 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1, 2.4.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Minor > Labels: patch > Attachments: HDFS-10265-001.patch, HDFS-10265-002.patch > > > I use OEV tool to convert editlog to xml file, then convert the xml file back > to binary editslog file(so that low version NameNode can load edits that > generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK > tag, the OEV tool doesn't handle the case and exits with InvalidXmlException. > Here is the stack: > {code} > fromXml error decoding opcode null > {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"}, > {"-2"}, {}, > {"3875711"}} > Encountered exception. Exiting: no entry found for BLOCK > org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for > BLOCK > at > org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942) > ... > {code} > Here is part of the xml file: > {code} > > OP_UPDATE_BLOCKS > > 3875711 > > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 > > -2 > > > {code} > I tracked the NN's log and found those operation: > 0. The file > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is > very small and contains only one block. > 1. Client ask NN to add block to the file. > 2. Client failed to write to DN and asked NameNode to abandon block. > 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog > Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags. > In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
[ https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Chang updated HDFS-10265: - Attachment: HDFS-10265-002.patch > OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag > - > > Key: HDFS-10265 > URL: https://issues.apache.org/jira/browse/HDFS-10265 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.4.1, 2.7.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Minor > Labels: patch > Attachments: HDFS-10265-001.patch, HDFS-10265-002.patch > > > I use OEV tool to convert editlog to xml file, then convert the xml file back > to binary editslog file(so that low version NameNode can load edits that > generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK > tag, the OEV tool doesn't handle the case and exits with InvalidXmlException. > Here is the stack: > {code} > fromXml error decoding opcode null > {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"}, > {"-2"}, {}, > {"3875711"}} > Encountered exception. Exiting: no entry found for BLOCK > org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for > BLOCK > at > org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942) > ... > {code} > Here is part of the xml file: > {code} > > OP_UPDATE_BLOCKS > > 3875711 > > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 > > -2 > > > {code} > I tracked the NN's log and found those operation: > 0. The file > /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is > very small and contains only one block. > 1. Client ask NN to add block to the file. > 2. Client failed to write to DN and asked NameNode to abandon block. > 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog > Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags. > In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10303) DataStreamer#ResponseProcessor calculate packet acknowledge duration wrongly.
[ https://issues.apache.org/jira/browse/HDFS-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245720#comment-15245720 ] Surendra Singh Lilhore commented on HDFS-10303: --- I am getting the "Slow ReadProcessor read" log in my cluster when I increase the socket timeout for client. {noformat} 16/04/14 17:57:59 WARN DataStreamer: Slow ReadProcessor read fields for block BP-873267638-192.168.100.12-1460002479721:blk_1073752739_11917 took 47858ms (threshold=3ms); ack: seqno: 3 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 803180 flag: 0 flag: 0 flag: 0, targets: [DatanodeInfoWithStorage[192.168.100.9:25009,DS-d552bfd7-1c38-430d-8703-c3b539caf351,DISK], DatanodeInfoWithStorage[192.168.100.11:25009,DS-02897c9b-bceb-4790-b08a-f711d8e3fd81,DISK], DatanodeInfoWithStorage[192.168.100.10:25009,DS-fae7b497-a269-4614-afe5-7006660eafcf,DISK]] {noformat} But when I checked the packet send time, it is same as packet acknowledge time {noformat} 16/04/14 17:57:59 DEBUG DataStreamer: DataStreamer block BP-873267638-192.168.100.12-1460002479721:blk_1073752739_11917 sending packet packet seqno: 3 offsetInBlock: 8704 lastPacketInBlock: false lastByteOffsetInBlock: 12316 {noformat} This is coming because {{ResponseProcessor}} set the current time as begin time and wait for the packet ack, after getting the ack it will calculate the duration and compare with the {{dfs.client.slow.io.warning.threshold.ms}}. {code} // read an ack from the pipeline long begin = Time.monotonicNow(); ack.readFields(blockReplyStream); long duration = Time.monotonicNow() - begin; {code} Suppose client sent two packets and now he doesn't have data to write, after some time he got more data and sent third packet. Client waited for some time after sending second packet. Time between second packet and third packet should not be considered by {{ResponseProcessor}} in packet acknowledge duration. > DataStreamer#ResponseProcessor calculate packet acknowledge duration wrongly. > - > > Key: HDFS-10303 > URL: https://issues.apache.org/jira/browse/HDFS-10303 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.2 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > > Packets acknowledge duration should be calculated based on the packet send > time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10303) DataStreamer#ResponseProcessor calculate packet acknowledge duration wrongly.
Surendra Singh Lilhore created HDFS-10303: - Summary: DataStreamer#ResponseProcessor calculate packet acknowledge duration wrongly. Key: HDFS-10303 URL: https://issues.apache.org/jira/browse/HDFS-10303 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.2 Reporter: Surendra Singh Lilhore Assignee: Surendra Singh Lilhore Packets acknowledge duration should be calculated based on the packet send time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245677#comment-15245677 ] Xinwei Qin commented on HDFS-7859: --- [~rakeshr] [~drankye], and [~zhz], thanks for your comments and clarifications. Now, it is a good time to update this patch, though we should have a more clear about the details of custom policies. I am glad to rebase the patch with latest code and maybe attach it tomorrow. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9271) Implement basic NN operations
[ https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-9271: -- Attachment: HDFS-9271.HDFS-8707.000.patch First patch, half baked and not ready to use. Uploading to keep it around until I have some more time to finish it up. -Added blocking rpc calls to the protobuf stub generator -Started adding wrappers to populate protobuf messages handed off to the rpc calls > Implement basic NN operations > - > > Key: HDFS-9271 > URL: https://issues.apache.org/jira/browse/HDFS-9271 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: James Clampffer > Attachments: HDFS-9271.HDFS-8707.000.patch > > > Expose via C and C++ API: > * mkdirs > * rename > * delete > * stat > * chmod > * chown > * getListing > * setOwner > * fsync -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10299) libhdfs++: File length doesn't always count the last block if it's being written to
[ https://issues.apache.org/jira/browse/HDFS-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245660#comment-15245660 ] James Clampffer commented on HDFS-10299: Committed this to HDFS-8707. We should add some more tests that include concurrency between reads and writes to try and catch this stuff sooner. > libhdfs++: File length doesn't always count the last block if it's being > written to > --- > > Key: HDFS-10299 > URL: https://issues.apache.org/jira/browse/HDFS-10299 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Xiaowei Zhu > Attachments: HDFS-10299.HDFS-8707.000.patch > > > It looks like we aren't factoring in the last block of files that are being > written to or haven't been closed yet into the length of the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10299) libhdfs++: File length doesn't always count the last block if it's being written to
[ https://issues.apache.org/jira/browse/HDFS-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-10299: --- Summary: libhdfs++: File length doesn't always count the last block if it's being written to (was: libhdfs++: File length doesn't always going the last block if it's being written to) > libhdfs++: File length doesn't always count the last block if it's being > written to > --- > > Key: HDFS-10299 > URL: https://issues.apache.org/jira/browse/HDFS-10299 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Xiaowei Zhu > Attachments: HDFS-10299.HDFS-8707.000.patch > > > It looks like we aren't factoring in the last block of files that are being > written to or haven't been closed yet into the length of the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245624#comment-15245624 ] Hudson commented on HDFS-10275: --- FAILURE: Integrated in Hadoop-trunk-Commit #9626 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9626/]) HDFS-10275. TestDataNodeMetrics failing intermittently due to (waltersu4549: rev ab903029a9d353677184ff5602966b11ffb408b9) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.7.3 > > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value
[ https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245623#comment-15245623 ] Hudson commented on HDFS-10302: --- FAILURE: Integrated in Hadoop-trunk-Commit #9626 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9626/]) HDFS-10302. BlockPlacementPolicyDefault should use default replication (kihwal: rev d8b729e16fb253e6c84f414d419b5663d9219a43) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java > BlockPlacementPolicyDefault should use default replication considerload value > - > > Key: HDFS-10302 > URL: https://issues.apache.org/jira/browse/HDFS-10302 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-10302.001.patch > > > Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value > {{true}} as the replication considerload default value rather than using the > existed string constant value > {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}. > {code} > @Override > public void initialize(Configuration conf, FSClusterStats stats, > NetworkTopology clusterMap, > Host2NodesMap host2datanodeMap) { > this.considerLoad = conf.getBoolean( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true); > this.considerLoadFactor = conf.getDouble( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR, > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT); > this.stats = stats; > this.clusterMap = clusterMap; > this.host2datanodeMap = host2datanodeMap; > this.heartbeatInterval = conf.getLong( > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000; > this.tolerateHeartbeatMultiplier = conf.getInt( > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY, > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT); > this.staleInterval = conf.getLong( > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT); > this.preferLocalNode = conf.getBoolean( > DFSConfigKeys. > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY, > DFSConfigKeys. > > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT); > } > {code} > And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be > used in any place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value
[ https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-10302: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) > BlockPlacementPolicyDefault should use default replication considerload value > - > > Key: HDFS-10302 > URL: https://issues.apache.org/jira/browse/HDFS-10302 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-10302.001.patch > > > Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value > {{true}} as the replication considerload default value rather than using the > existed string constant value > {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}. > {code} > @Override > public void initialize(Configuration conf, FSClusterStats stats, > NetworkTopology clusterMap, > Host2NodesMap host2datanodeMap) { > this.considerLoad = conf.getBoolean( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true); > this.considerLoadFactor = conf.getDouble( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR, > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT); > this.stats = stats; > this.clusterMap = clusterMap; > this.host2datanodeMap = host2datanodeMap; > this.heartbeatInterval = conf.getLong( > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000; > this.tolerateHeartbeatMultiplier = conf.getInt( > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY, > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT); > this.staleInterval = conf.getLong( > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT); > this.preferLocalNode = conf.getBoolean( > DFSConfigKeys. > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY, > DFSConfigKeys. > > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT); > } > {code} > And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be > used in any place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value
[ https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245614#comment-15245614 ] Kihwal Lee commented on HDFS-10302: --- Committed this to trunk, branch-2 and branch-2.8. Thanks for fixing this, [~linyiqun]. > BlockPlacementPolicyDefault should use default replication considerload value > - > > Key: HDFS-10302 > URL: https://issues.apache.org/jira/browse/HDFS-10302 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Trivial > Attachments: HDFS-10302.001.patch > > > Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value > {{true}} as the replication considerload default value rather than using the > existed string constant value > {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}. > {code} > @Override > public void initialize(Configuration conf, FSClusterStats stats, > NetworkTopology clusterMap, > Host2NodesMap host2datanodeMap) { > this.considerLoad = conf.getBoolean( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true); > this.considerLoadFactor = conf.getDouble( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR, > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT); > this.stats = stats; > this.clusterMap = clusterMap; > this.host2datanodeMap = host2datanodeMap; > this.heartbeatInterval = conf.getLong( > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000; > this.tolerateHeartbeatMultiplier = conf.getInt( > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY, > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT); > this.staleInterval = conf.getLong( > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT); > this.preferLocalNode = conf.getBoolean( > DFSConfigKeys. > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY, > DFSConfigKeys. > > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT); > } > {code} > And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be > used in any place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value
[ https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245607#comment-15245607 ] Kihwal Lee commented on HDFS-10302: --- +1 > BlockPlacementPolicyDefault should use default replication considerload value > - > > Key: HDFS-10302 > URL: https://issues.apache.org/jira/browse/HDFS-10302 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Trivial > Attachments: HDFS-10302.001.patch > > > Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value > {{true}} as the replication considerload default value rather than using the > existed string constant value > {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}. > {code} > @Override > public void initialize(Configuration conf, FSClusterStats stats, > NetworkTopology clusterMap, > Host2NodesMap host2datanodeMap) { > this.considerLoad = conf.getBoolean( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true); > this.considerLoadFactor = conf.getDouble( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR, > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT); > this.stats = stats; > this.clusterMap = clusterMap; > this.host2datanodeMap = host2datanodeMap; > this.heartbeatInterval = conf.getLong( > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000; > this.tolerateHeartbeatMultiplier = conf.getInt( > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY, > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT); > this.staleInterval = conf.getLong( > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT); > this.preferLocalNode = conf.getBoolean( > DFSConfigKeys. > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY, > DFSConfigKeys. > > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT); > } > {code} > And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be > used in any place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver
[ https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245601#comment-15245601 ] Andras Bokor commented on HDFS-10264: - [~shv] Since the migrating from Log4j/Commons Logging to SLF4J is in progress gradually I suggest to use the following format: {code} LOG.info("Saving image file {} using {}.", newFile, compression); {code} {code} LOG.info("Image file {} of size {} bytes saved in {} seconds.", newFile, newFile.length(), (now() - startTime)/1000); {code} Also, the type of LOG variable needs to be changed. What do you think? > Logging improvements in FSImageFormatProtobuf.Saver > --- > > Key: HDFS-10264 > URL: https://issues.apache.org/jira/browse/HDFS-10264 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Xiaobing Zhou > Labels: newbie > > There are two missing LOG messages in {{FSImageFormat.Saver}} that are > missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of > fsimage saving. Would be good to have them logged for protobuf images as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10276) Different results for exist call for file.ext/name
[ https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245590#comment-15245590 ] Kevin Cox commented on HDFS-10276: -- Oops, my mistake. That chmod should be 666 (non executable) which would cause the `cat` to throw the exception. The new log is below. {code} % hdfs --config starscream/hadoop dfs -put <(echo test) /test 2016-04-18 08:15:41,615 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable hdfs --config starscream/hadoop dfs -put <(echo test) /test 2.72s user 0.18s system 128% cpu 2.269 total % hdfs --config starscream/hadoop dfs -chmod 666 /test 2016-04-18 08:16:52,903 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable hdfs --config starscream/hadoop dfs -chmod 666 /test 2.37s user 0.16s system 182% cpu 1.390 total % hdfs --config starscream/hadoop dfs -cat /test/bar 2016-04-18 08:16:55,743 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable cat: Permission denied: user=foo, access=EXECUTE, inode="/test/bar":kevincox:supergroup:-rw-rw-rw- HADOOP_USER_NAME=foo hdfs --config starscream/hadoop dfs -cat /test/bar 2.40s user 0.16s system 185% cpu 1.378 total {code} > Different results for exist call for file.ext/name > -- > > Key: HDFS-10276 > URL: https://issues.apache.org/jira/browse/HDFS-10276 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kevin Cox >Assignee: Yuanbo Liu > > Given you have a file {{/file}} an existence check for the path > {{/file/whatever}} will give different responses for different > implementations of FileSystem. > LocalFileSystem will return false while DistributedFileSystem will throw > {{org.apache.hadoop.security.AccessControlException: Permission denied: ..., > access=EXECUTE, ...}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-10275: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.8, branch-2.7. Thanks [~linyiqun] for the contribution! > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.7.3 > > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245569#comment-15245569 ] Walter Su commented on HDFS-10275: -- sorry I didn't see that. The patch LGTM. +1. > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10299) libhdfs++: File length doesn't always going the last block if it's being written to
[ https://issues.apache.org/jira/browse/HDFS-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245552#comment-15245552 ] James Clampffer commented on HDFS-10299: This looks good to me, thanks for the fix Xiaowei. I'll commit it momentarily. > libhdfs++: File length doesn't always going the last block if it's being > written to > --- > > Key: HDFS-10299 > URL: https://issues.apache.org/jira/browse/HDFS-10299 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Xiaowei Zhu > Attachments: HDFS-10299.HDFS-8707.000.patch > > > It looks like we aren't factoring in the last block of files that are being > written to or haven't been closed yet into the length of the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8449) Add tasks count metrics to datanode for ECWorker
[ https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245534#comment-15245534 ] Kai Zheng commented on HDFS-8449: - Thanks Bo for this work about metrics. I will look at it. > Add tasks count metrics to datanode for ECWorker > > > Key: HDFS-8449 > URL: https://issues.apache.org/jira/browse/HDFS-8449 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > Attachments: HDFS-8449-000.patch, HDFS-8449-001.patch, > HDFS-8449-002.patch, HDFS-8449-003.patch, HDFS-8449-004.patch > > > This sub task try to record ec recovery tasks that a datanode has done, > including total tasks, failed tasks and sucessful tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245521#comment-15245521 ] Lin Yiqun commented on HDFS-10275: -- Hi, [~walter.k.su], I have removed {{SimulatedFSDataset.setFactory(conf);}} in my patch, do you means there is no need to bump the timeout time in addition? > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8449) Add tasks count metrics to datanode for ECWorker
[ https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245457#comment-15245457 ] Hadoop QA commented on HDFS-8449: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 25s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 51s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 62 unchanged - 0 fixed = 64 total (was 62) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m 5s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 27s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 246m 15s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.namenode.TestEditLog | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.TestSafeModeWithStripedFile | | JDK v1.7.0_95 Failed junit tests |
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245394#comment-15245394 ] Walter Su commented on HDFS-10275: -- Good analysis! I think a better way to do this is to use a real FSDataset? Just remove {{SimulatedFSDataset.setFactory(conf);}}. What do you think ? > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10291) TestShortCircuitLocalRead failing
[ https://issues.apache.org/jira/browse/HDFS-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-10291: -- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) thanks -patch in > TestShortCircuitLocalRead failing > - > > Key: HDFS-10291 > URL: https://issues.apache.org/jira/browse/HDFS-10291 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 2.8.0 > > Attachments: HDFS-10291-001.patch > > > {{TestShortCircuitLocalRead}} failing as length of read is considered off end > of buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)