[jira] Commented: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN
[ https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990462#comment-12990462 ] Todd Lipcon commented on HDFS-900: -- I'm a little concerned that this wasn't committed with a test. The fix looks good but manual testing won't prevent a regression > Corrupt replicas are not tracked correctly through block report from DN > --- > > Key: HDFS-900 > URL: https://issues.apache.org/jira/browse/HDFS-900 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Konstantin Shvachko >Priority: Blocker > Fix For: 0.22.0 > > Attachments: log-commented, reportCorruptBlock.patch, > to-reproduce.patch > > > This one is tough to describe, but essentially the following order of events > is seen to occur: > # A client marks one replica of a block to be corrupt by telling the NN about > it > # Replication is then scheduled to make a new replica of this node > # The replication completes, such that there are now 3 good replicas and 1 > corrupt replica > # The DN holding the corrupt replica sends a block report. Rather than > telling this DN to delete the node, the NN instead marks this as a new *good* > replica of the block, and schedules deletion on one of the good replicas. > I don't know if this is a dataloss bug in the case of 1 corrupt replica with > dfs.replication=2, but it seems feasible. I will attach a debug log with some > commentary marked by '>', plus a unit test patch which I can get > to reproduce this behavior reliably. (it's not a proper unit test, just some > edits to an existing one to show it) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990457#comment-12990457 ] Konstantin Boudnik commented on HDFS-1602: -- +1 patch seems to be legit and it sure fixes HDFS-1496 > Fix HADOOP-4885 for it is doesn't work as expected. > --- > > Key: HDFS-1602 > URL: https://issues.apache.org/jira/browse/HDFS-1602 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Konstantin Boudnik > Attachments: HDFS-1602.patch > > > NameNode storage restore functionality doesn't work (as HDFS-903 > demonstrated). This needs to be either disabled, or removed, or fixed. This > feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-875) NameNode incorretly handles corrupt replicas
[ https://issues.apache.org/jira/browse/HDFS-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990452#comment-12990452 ] Konstantin Shvachko commented on HDFS-875: -- Now that HDFS-900 is closed, should we close this one as well? If not than it would be good to have a precise description of what the problem is, how it reveals itself, is it reproducible, and have anybody seen it live. > NameNode incorretly handles corrupt replicas > > > Key: HDFS-875 > URL: https://issues.apache.org/jira/browse/HDFS-875 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hairong Kuang >Priority: Blocker > Fix For: 0.22.0 > > > I reviewed how NameNode handles corrupt replicas as part of work on HDFS-145. > Comparing to releases prior to 0.21, NameNode now does a good job identifying > corrupt replicas, but it seems to me there are two flaws how it handles the > corrupt replicas: > 1. NameNode does not add corrupt replicas to the block locations as what > NameNode does before; > 2. If the corruption is caused by generation stamp mismatch or state > mismatch, the wrong GS and state do not get put in corruptReplicasMap. > Therefore it may lead to the deletion of the wrong replica. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN
[ https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-900: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. > Corrupt replicas are not tracked correctly through block report from DN > --- > > Key: HDFS-900 > URL: https://issues.apache.org/jira/browse/HDFS-900 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Konstantin Shvachko >Priority: Blocker > Fix For: 0.22.0 > > Attachments: log-commented, reportCorruptBlock.patch, > to-reproduce.patch > > > This one is tough to describe, but essentially the following order of events > is seen to occur: > # A client marks one replica of a block to be corrupt by telling the NN about > it > # Replication is then scheduled to make a new replica of this node > # The replication completes, such that there are now 3 good replicas and 1 > corrupt replica > # The DN holding the corrupt replica sends a block report. Rather than > telling this DN to delete the node, the NN instead marks this as a new *good* > replica of the block, and schedules deletion on one of the good replicas. > I don't know if this is a dataloss bug in the case of 1 corrupt replica with > dfs.replication=2, but it seems feasible. I will attach a debug log with some > commentary marked by '>', plus a unit test patch which I can get > to reproduce this behavior reliably. (it's not a proper unit test, just some > edits to an existing one to show it) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN
[ https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990446#comment-12990446 ] Konstantin Shvachko commented on HDFS-900: -- test failures: TestFileConcurrentReader - HDFS-1401 TestStorageRestore - HDFS-1496 test-patch results: {code} [exec] -1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] +1 system test framework. The patch passed system test framework compile. [exec] == {code} Testing of this patch have dome manually and using Todd's utility attached above. > Corrupt replicas are not tracked correctly through block report from DN > --- > > Key: HDFS-900 > URL: https://issues.apache.org/jira/browse/HDFS-900 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Konstantin Shvachko >Priority: Blocker > Fix For: 0.22.0 > > Attachments: log-commented, reportCorruptBlock.patch, > to-reproduce.patch > > > This one is tough to describe, but essentially the following order of events > is seen to occur: > # A client marks one replica of a block to be corrupt by telling the NN about > it > # Replication is then scheduled to make a new replica of this node > # The replication completes, such that there are now 3 good replicas and 1 > corrupt replica > # The DN holding the corrupt replica sends a block report. Rather than > telling this DN to delete the node, the NN instead marks this as a new *good* > replica of the block, and schedules deletion on one of the good replicas. > I don't know if this is a dataloss bug in the case of 1 corrupt replica with > dfs.replication=2, but it seems feasible. I will attach a debug log with some > commentary marked by '>', plus a unit test patch which I can get > to reproduce this behavior reliably. (it's not a proper unit test, just some > edits to an existing one to show it) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN
[ https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990424#comment-12990424 ] Jakob Homan commented on HDFS-900: -- +1 > Corrupt replicas are not tracked correctly through block report from DN > --- > > Key: HDFS-900 > URL: https://issues.apache.org/jira/browse/HDFS-900 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Konstantin Shvachko >Priority: Blocker > Fix For: 0.22.0 > > Attachments: log-commented, reportCorruptBlock.patch, > to-reproduce.patch > > > This one is tough to describe, but essentially the following order of events > is seen to occur: > # A client marks one replica of a block to be corrupt by telling the NN about > it > # Replication is then scheduled to make a new replica of this node > # The replication completes, such that there are now 3 good replicas and 1 > corrupt replica > # The DN holding the corrupt replica sends a block report. Rather than > telling this DN to delete the node, the NN instead marks this as a new *good* > replica of the block, and schedules deletion on one of the good replicas. > I don't know if this is a dataloss bug in the case of 1 corrupt replica with > dfs.replication=2, but it seems feasible. I will attach a debug log with some > commentary marked by '>', plus a unit test patch which I can get > to reproduce this behavior reliably. (it's not a proper unit test, just some > edits to an existing one to show it) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1515) Test append and quotas
[ https://issues.apache.org/jira/browse/HDFS-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990420#comment-12990420 ] Po Cheung commented on HDFS-1515: - There exists a testcase for quotas and append in TestQuota.testSpaceCommands. Will that suffice? {noformat} // Test Append : // verify space quota c = dfs.getContentSummary(quotaDir1); assertEquals(c.getSpaceQuota(), 4 * fileSpace); // verify space before append; c = dfs.getContentSummary(dstPath); assertEquals(c.getSpaceConsumed(), 3 * fileSpace); OutputStream out = dfs.append(file2); // appending 1 fileLen should succeed out.write(new byte[fileLen]); out.close(); file2Len += fileLen; // after append // verify space after append; c = dfs.getContentSummary(dstPath); assertEquals(c.getSpaceConsumed(), 4 * fileSpace); // now increase the quota for quotaDir1 dfs.setQuota(quotaDir1, FSConstants.QUOTA_DONT_SET, 5 * fileSpace); // Now, appending more than 1 fileLen should result in an error out = dfs.append(file2); hasException = false; try { out.write(new byte[fileLen + 1024]); out.flush(); out.close(); } catch (DSQuotaExceededException e) { hasException = true; IOUtils.closeStream(out); } assertTrue(hasException); file2Len += fileLen; // after partial append // verify space after partial append c = dfs.getContentSummary(dstPath); assertEquals(c.getSpaceConsumed(), 5 * fileSpace); {noformat} > Test append and quotas > --- > > Key: HDFS-1515 > URL: https://issues.apache.org/jira/browse/HDFS-1515 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.22.0, 0.23.0 >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.22.0, 0.23.0 > > > There is no test coverage for quotas and append. Let's add a test to > TestQuota that covers that quotas are updated correctly when appending to a > file. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Closed: (HDFS-982) TestDelegationToken#testDelegationTokenWithRealUser is failing
[ https://issues.apache.org/jira/browse/HDFS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins closed HDFS-982. > TestDelegationToken#testDelegationTokenWithRealUser is failing > -- > > Key: HDFS-982 > URL: https://issues.apache.org/jira/browse/HDFS-982 > Project: Hadoop HDFS > Issue Type: Test > Components: contrib/hdfsproxy, security >Affects Versions: 0.22.0 >Reporter: Eli Collins >Assignee: Po Cheung >Priority: Blocker > Fix For: 0.22.0 > > > Hudson is reporting that TestDelegationToken#testDelegationTokenWithRealUser > is failing on trunk. > Failing for the past 10 builds (Since #223 ) > Took 0.61 sec. > add description > Error Message > User: RealUser is not allowed to impersonate proxyUser > Stacktrace > org.apache.hadoop.ipc.RemoteException: User: RealUser is not allowed to > impersonate proxyUser > at org.apache.hadoop.ipc.Client.call(Client.java:887) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198) > at $Proxy7.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:220) > at > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:151) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:250) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:217) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:87) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1747) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:69) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1775) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1763) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:193) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:101) > at > org.apache.hadoop.hdfs.MiniDFSCluster.getFileSystem(MiniDFSCluster.java:813) > at > org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:147) > at > org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:145) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:706) > at > org.apache.hadoop.hdfs.security.TestDelegationToken.testDelegationTokenWithRealUser(TestDelegationToken.java:144) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (HDFS-982) TestDelegationToken#testDelegationTokenWithRealUser is failing
[ https://issues.apache.org/jira/browse/HDFS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HDFS-982. -- Resolution: Fixed Looks like it's fixed. > TestDelegationToken#testDelegationTokenWithRealUser is failing > -- > > Key: HDFS-982 > URL: https://issues.apache.org/jira/browse/HDFS-982 > Project: Hadoop HDFS > Issue Type: Test > Components: contrib/hdfsproxy, security >Affects Versions: 0.22.0 >Reporter: Eli Collins >Assignee: Po Cheung >Priority: Blocker > Fix For: 0.22.0 > > > Hudson is reporting that TestDelegationToken#testDelegationTokenWithRealUser > is failing on trunk. > Failing for the past 10 builds (Since #223 ) > Took 0.61 sec. > add description > Error Message > User: RealUser is not allowed to impersonate proxyUser > Stacktrace > org.apache.hadoop.ipc.RemoteException: User: RealUser is not allowed to > impersonate proxyUser > at org.apache.hadoop.ipc.Client.call(Client.java:887) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198) > at $Proxy7.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:220) > at > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:151) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:250) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:217) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:87) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1747) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:69) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1775) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1763) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:193) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:101) > at > org.apache.hadoop.hdfs.MiniDFSCluster.getFileSystem(MiniDFSCluster.java:813) > at > org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:147) > at > org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:145) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:706) > at > org.apache.hadoop.hdfs.security.TestDelegationToken.testDelegationTokenWithRealUser(TestDelegationToken.java:144) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-982) TestDelegationToken#testDelegationTokenWithRealUser is failing
[ https://issues.apache.org/jira/browse/HDFS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990413#comment-12990413 ] Konstantin Shvachko commented on HDFS-982: -- Eli do you still see that failing. If not I'll close it. > TestDelegationToken#testDelegationTokenWithRealUser is failing > -- > > Key: HDFS-982 > URL: https://issues.apache.org/jira/browse/HDFS-982 > Project: Hadoop HDFS > Issue Type: Test > Components: contrib/hdfsproxy, security >Affects Versions: 0.22.0 >Reporter: Eli Collins >Assignee: Po Cheung >Priority: Blocker > Fix For: 0.22.0 > > > Hudson is reporting that TestDelegationToken#testDelegationTokenWithRealUser > is failing on trunk. > Failing for the past 10 builds (Since #223 ) > Took 0.61 sec. > add description > Error Message > User: RealUser is not allowed to impersonate proxyUser > Stacktrace > org.apache.hadoop.ipc.RemoteException: User: RealUser is not allowed to > impersonate proxyUser > at org.apache.hadoop.ipc.Client.call(Client.java:887) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198) > at $Proxy7.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:220) > at > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:151) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:250) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:217) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:87) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1747) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:69) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1775) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1763) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:193) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:101) > at > org.apache.hadoop.hdfs.MiniDFSCluster.getFileSystem(MiniDFSCluster.java:813) > at > org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:147) > at > org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:145) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:706) > at > org.apache.hadoop.hdfs.security.TestDelegationToken.testDelegationTokenWithRealUser(TestDelegationToken.java:144) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990410#comment-12990410 ] Sanjay Radia commented on HDFS-347: --- Dhruba, you had mentioned that you have a prototype for this local optimization. Could you please share the performance improvement observed and your approach. > DFS read performance suboptimal when client co-located on nodes with data > - > > Key: HDFS-347 > URL: https://issues.apache.org/jira/browse/HDFS-347 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: George Porter >Assignee: Todd Lipcon > Attachments: HADOOP-4801.1.patch, HADOOP-4801.2.patch, > HADOOP-4801.3.patch, all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc > > > One of the major strategies Hadoop uses to get scalable data processing is to > move the code to the data. However, putting the DFS client on the same > physical node as the data blocks it acts on doesn't improve read performance > as much as expected. > After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem > is due to the HDFS streaming protocol causing many more read I/O operations > (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB > disk block from the DataNode process (running in a separate JVM) running on > the same machine. The DataNode will satisfy the single disk block request by > sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, > this is done in the sendChunk() method, relying on Java's transferTo() > method. Depending on the host O/S and JVM implementation, transferTo() is > implemented as either a sendfilev() syscall or a pair of mmap() and write(). > In either case, each chunk is read from the disk by issuing a separate I/O > operation for each chunk. The result is that the single request for a 64-MB > block ends up hitting the disk as over a thousand smaller requests for 64-KB > each. > Since the DFSClient runs in a different JVM and process than the DataNode, > shuttling data from the disk to the DFSClient also results in context > switches each time network packets get sent (in this case, the 64-kb chunk > turns into a large number of 1500 byte packet send operations). Thus we see > a large number of context switches for each block send operation. > I'd like to get some feedback on the best way to address this, but I think > providing a mechanism for a DFSClient to directly open data blocks that > happen to be on the same machine. It could do this by examining the set of > LocatedBlocks returned by the NameNode, marking those that should be resident > on the local host. Since the DataNode and DFSClient (probably) share the > same hadoop configuration, the DFSClient should be able to find the files > holding the block data, and it could directly open them and send data back to > the client. This would avoid the context switches imposed by the network > layer, and would allow for much larger read buffers than 64KB, which should > reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-982) TestDelegationToken#testDelegationTokenWithRealUser is failing
[ https://issues.apache.org/jira/browse/HDFS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990382#comment-12990382 ] Po Cheung commented on HDFS-982: The test passes on trunk as well. > TestDelegationToken#testDelegationTokenWithRealUser is failing > -- > > Key: HDFS-982 > URL: https://issues.apache.org/jira/browse/HDFS-982 > Project: Hadoop HDFS > Issue Type: Test > Components: contrib/hdfsproxy, security >Affects Versions: 0.22.0 >Reporter: Eli Collins >Assignee: Po Cheung >Priority: Blocker > Fix For: 0.22.0 > > > Hudson is reporting that TestDelegationToken#testDelegationTokenWithRealUser > is failing on trunk. > Failing for the past 10 builds (Since #223 ) > Took 0.61 sec. > add description > Error Message > User: RealUser is not allowed to impersonate proxyUser > Stacktrace > org.apache.hadoop.ipc.RemoteException: User: RealUser is not allowed to > impersonate proxyUser > at org.apache.hadoop.ipc.Client.call(Client.java:887) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198) > at $Proxy7.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:220) > at > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:151) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:250) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:217) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:87) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1747) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:69) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1775) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1763) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:193) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:101) > at > org.apache.hadoop.hdfs.MiniDFSCluster.getFileSystem(MiniDFSCluster.java:813) > at > org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:147) > at > org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:145) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:706) > at > org.apache.hadoop.hdfs.security.TestDelegationToken.testDelegationTokenWithRealUser(TestDelegationToken.java:144) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix
[ https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990355#comment-12990355 ] Boris Shkolnik commented on HDFS-1496: -- Looks like the problem is that when we are trying to restore a storage dir, we format it , which always saves the current in-memory state into a _new_ fsimage. So instead we should restore a storage without saving the state and creating new fsimage. It will be copied there during the checkpoint anyway. I've attached the patch to HDFS-1602. Please look at it and comment.(patch is for trunk). > TestStorageRestore is failing after HDFS-903 fix > > > Key: HDFS-1496 > URL: https://issues.apache.org/jira/browse/HDFS-1496 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.22.0, 0.23.0 >Reporter: Konstantin Boudnik >Assignee: Hairong Kuang >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1496.sh, HDFS-1496.sh, HDFS-1496.sh > > > TestStorageRestore seems to be failing after HDFS-903 commit. Running git > bisect confirms it. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (HDFS-1607) Fix references to misspelled method name getProtocolSigature
[ https://issues.apache.org/jira/browse/HDFS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-1607. --- Resolution: Fixed Hadoop Flags: [Reviewed] Verified that this patch compiles against the current Common snapshot and committed to trunk. > Fix references to misspelled method name getProtocolSigature > > > Key: HDFS-1607 > URL: https://issues.apache.org/jira/browse/HDFS-1607 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Trivial > Attachments: hdfs-1607.txt > > -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1610) TestClientProtocolWithDelegationToken failing
[ https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1610: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) > TestClientProtocolWithDelegationToken failing > - > > Key: HDFS-1610 > URL: https://issues.apache.org/jira/browse/HDFS-1610 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-1610.txt > > > Another instance of the same type of failure as MAPREDUCE-2300 (a mock > protocol implementation isn't returning a protocol signature) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN
[ https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-900: - Status: Patch Available (was: Open) > Corrupt replicas are not tracked correctly through block report from DN > --- > > Key: HDFS-900 > URL: https://issues.apache.org/jira/browse/HDFS-900 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Konstantin Shvachko >Priority: Blocker > Fix For: 0.22.0 > > Attachments: log-commented, reportCorruptBlock.patch, > to-reproduce.patch > > > This one is tough to describe, but essentially the following order of events > is seen to occur: > # A client marks one replica of a block to be corrupt by telling the NN about > it > # Replication is then scheduled to make a new replica of this node > # The replication completes, such that there are now 3 good replicas and 1 > corrupt replica > # The DN holding the corrupt replica sends a block report. Rather than > telling this DN to delete the node, the NN instead marks this as a new *good* > replica of the block, and schedules deletion on one of the good replicas. > I don't know if this is a dataloss bug in the case of 1 corrupt replica with > dfs.replication=2, but it seems feasible. I will attach a debug log with some > commentary marked by '>', plus a unit test patch which I can get > to reproduce this behavior reliably. (it's not a proper unit test, just some > edits to an existing one to show it) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1610) TestClientProtocolWithDelegationToken failing
[ https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990287#comment-12990287 ] Hadoop QA commented on HDFS-1610: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12470177/hdfs-1610.txt against trunk revision 1066305. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestStorageRestore -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/146//testReport/ Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/146//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/146//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/146//console This message is automatically generated. > TestClientProtocolWithDelegationToken failing > - > > Key: HDFS-1610 > URL: https://issues.apache.org/jira/browse/HDFS-1610 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-1610.txt > > > Another instance of the same type of failure as MAPREDUCE-2300 (a mock > protocol implementation isn't returning a protocol signature) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1610) TestClientProtocolWithDelegationToken failing
[ https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990282#comment-12990282 ] Konstantin Boudnik commented on HDFS-1610: -- +1 patch looks good and fixed the issues > TestClientProtocolWithDelegationToken failing > - > > Key: HDFS-1610 > URL: https://issues.apache.org/jira/browse/HDFS-1610 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-1610.txt > > > Another instance of the same type of failure as MAPREDUCE-2300 (a mock > protocol implementation isn't returning a protocol signature) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1610) TestClientProtocolWithDelegationToken failing
[ https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1610: -- Status: Patch Available (was: Open) > TestClientProtocolWithDelegationToken failing > - > > Key: HDFS-1610 > URL: https://issues.apache.org/jira/browse/HDFS-1610 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-1610.txt > > > Another instance of the same type of failure as MAPREDUCE-2300 (a mock > protocol implementation isn't returning a protocol signature) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1610) TestClientProtocolWithDelegationToken failing
[ https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1610: -- Attachment: hdfs-1610.txt Patch to fix both tests > TestClientProtocolWithDelegationToken failing > - > > Key: HDFS-1610 > URL: https://issues.apache.org/jira/browse/HDFS-1610 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-1610.txt > > > Another instance of the same type of failure as MAPREDUCE-2300 (a mock > protocol implementation isn't returning a protocol signature) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1610) TestClientProtocolWithDelegationToken failing
[ https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990239#comment-12990239 ] Todd Lipcon commented on HDFS-1610: --- TestBlockToken has the same problem > TestClientProtocolWithDelegationToken failing > - > > Key: HDFS-1610 > URL: https://issues.apache.org/jira/browse/HDFS-1610 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > > Another instance of the same type of failure as MAPREDUCE-2300 (a mock > protocol implementation isn't returning a protocol signature) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HDFS-1610) TestClientProtocolWithDelegationToken failing
TestClientProtocolWithDelegationToken failing - Key: HDFS-1610 URL: https://issues.apache.org/jira/browse/HDFS-1610 Project: Hadoop HDFS Issue Type: Bug Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Another instance of the same type of failure as MAPREDUCE-2300 (a mock protocol implementation isn't returning a protocol signature) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1602: - Attachment: HDFS-1602.patch > Fix HADOOP-4885 for it is doesn't work as expected. > --- > > Key: HDFS-1602 > URL: https://issues.apache.org/jira/browse/HDFS-1602 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Konstantin Boudnik > Attachments: HDFS-1602.patch > > > NameNode storage restore functionality doesn't work (as HDFS-903 > demonstrated). This needs to be either disabled, or removed, or fixed. This > feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990218#comment-12990218 ] Boris Shkolnik commented on HDFS-1602: -- I've looked at testStorageRestore failure. Seems like the problem is that when we are trying to restore a storage dir, we format it , which always saves the current in-memory state into a new fsimage. So instead we should restore a storage without storing creating new fsimage. It will be copied from the CheckPoint. Here is a patch for trunk to do it. Please review. > Fix HADOOP-4885 for it is doesn't work as expected. > --- > > Key: HDFS-1602 > URL: https://issues.apache.org/jira/browse/HDFS-1602 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Konstantin Boudnik > > NameNode storage restore functionality doesn't work (as HDFS-903 > demonstrated). This needs to be either disabled, or removed, or fixed. This > feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1609) rmr command is not displaying any error message when a path contains wildcard characters and does not exist.
[ https://issues.apache.org/jira/browse/HDFS-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HDFS-1609: -- Attachment: HDFS-1609.patch currently if globAndProcess() doesn't find any file that match with the pattern, does nothing. with the HDFS-1609.patch, raise FileNotFoundException if there's no matches > rmr command is not displaying any error message when a path contains wildcard > characters and does not exist. > > > Key: HDFS-1609 > URL: https://issues.apache.org/jira/browse/HDFS-1609 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.20.1, 0.20.2 >Reporter: Uma Maheswara Rao G >Priority: Minor > Attachments: HDFS-1609.patch > > > When we give invalid directory path then it will show error message on the > console. But if we provide the wildcard expression in invalid directory path > then it will not show any error message even there is no pattern match for > that path. > linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin # ./hdfs dfs -rmr /test/test > rmr: cannot remove /test/test: No such file or directory. > *linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin # ./hdfs dfs -rmr /test* * > *linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin #* -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HDFS-1609) rmr command is not displaying any error message when a path contains wildcard characters and does not exist.
rmr command is not displaying any error message when a path contains wildcard characters and does not exist. Key: HDFS-1609 URL: https://issues.apache.org/jira/browse/HDFS-1609 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.2, 0.20.1 Reporter: Uma Maheswara Rao G Priority: Minor When we give invalid directory path then it will show error message on the console. But if we provide the wildcard expression in invalid directory path then it will not show any error message even there is no pattern match for that path. linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin # ./hdfs dfs -rmr /test/test rmr: cannot remove /test/test: No such file or directory. *linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin # ./hdfs dfs -rmr /test* * *linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin #* -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1608) Provide overwrite option (-overwrite/-f) in put and copyFromLocal command line options
[ https://issues.apache.org/jira/browse/HDFS-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-1608: -- Description: FileSystem has the API *public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, Path dst)* This API provides overwrite option. But the mapping command line doesn't have this option. To maintain the consistency and better usage the command line option also can support the overwrite option like to put the files forcefully. ( put [-f] ) and also for copyFromLocal command line option. was: FileSystem has the API {code:xml} public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, Path dst) {code} This API provides overwrite option. But the mapping command line doesn't have this option. To maintain the consistency and better usage the command line option also can support the overwrite option like to put the files forcefully. ( put [-f] ) and also for copyFromLocal command line option. > Provide overwrite option (-overwrite/-f) in put and copyFromLocal command > line options > -- > > Key: HDFS-1608 > URL: https://issues.apache.org/jira/browse/HDFS-1608 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Affects Versions: 0.20.1, 0.20.2 >Reporter: Uma Maheswara Rao G >Priority: Minor > > FileSystem has the API > *public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] > srcs, Path dst)* > > > This API provides overwrite option. But the mapping command line doesn't have > this option. To maintain the consistency and better usage the command line > option also can support the overwrite option like to put the files > forcefully. ( put [-f] ) and also for copyFromLocal > command line option. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HDFS-1608) Provide overwrite option (-overwrite/-f) in put and copyFromLocal command line options
Provide overwrite option (-overwrite/-f) in put and copyFromLocal command line options -- Key: HDFS-1608 URL: https://issues.apache.org/jira/browse/HDFS-1608 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.20.2, 0.20.1 Reporter: Uma Maheswara Rao G Priority: Minor FileSystem has the API {code:xml} public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, Path dst) {code} This API provides overwrite option. But the mapping command line doesn't have this option. To maintain the consistency and better usage the command line option also can support the overwrite option like to put the files forcefully. ( put [-f] ) and also for copyFromLocal command line option. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN
[ https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reassigned HDFS-900: Assignee: Konstantin Shvachko > Corrupt replicas are not tracked correctly through block report from DN > --- > > Key: HDFS-900 > URL: https://issues.apache.org/jira/browse/HDFS-900 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Konstantin Shvachko >Priority: Blocker > Fix For: 0.22.0 > > Attachments: log-commented, reportCorruptBlock.patch, > to-reproduce.patch > > > This one is tough to describe, but essentially the following order of events > is seen to occur: > # A client marks one replica of a block to be corrupt by telling the NN about > it > # Replication is then scheduled to make a new replica of this node > # The replication completes, such that there are now 3 good replicas and 1 > corrupt replica > # The DN holding the corrupt replica sends a block report. Rather than > telling this DN to delete the node, the NN instead marks this as a new *good* > replica of the block, and schedules deletion on one of the good replicas. > I don't know if this is a dataloss bug in the case of 1 corrupt replica with > dfs.replication=2, but it seems feasible. I will attach a debug log with some > commentary marked by '>', plus a unit test patch which I can get > to reproduce this behavior reliably. (it's not a proper unit test, just some > edits to an existing one to show it) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN
[ https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-900: - Attachment: reportCorruptBlock.patch Yes, this is indeed a bug in block report. After step 3 in Todd's description the NN has 3 good replicas and one corrupt. The corrupt replica is in recentInvalidatesSet, but not in the DatanodeDescriptor. That is the replica is scheduled for deletion from the DN. See blockReceived(). But before it is deleted from the DN, that same DN sends a block report, which contains the replica. DatanodeDescriptor.processReport() treats it as a new replica because it is not in the DatanodeDescriptor and a good one since its blockId, generationStamp, and the length are in order. The fix is to ignore replicas that are scheduled for deletion from this DN. I tested this patch with the test case attached by Todd, thanks. The test passes with the fix and fails without. The test case is not exactly a unit test as it introduces changes to FSNamesystem class for testing. So I did not include it to the patch. Todd, is it possible to convert your case into a real unit test. > Corrupt replicas are not tracked correctly through block report from DN > --- > > Key: HDFS-900 > URL: https://issues.apache.org/jira/browse/HDFS-900 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Priority: Blocker > Fix For: 0.22.0 > > Attachments: log-commented, reportCorruptBlock.patch, > to-reproduce.patch > > > This one is tough to describe, but essentially the following order of events > is seen to occur: > # A client marks one replica of a block to be corrupt by telling the NN about > it > # Replication is then scheduled to make a new replica of this node > # The replication completes, such that there are now 3 good replicas and 1 > corrupt replica > # The DN holding the corrupt replica sends a block report. Rather than > telling this DN to delete the node, the NN instead marks this as a new *good* > replica of the block, and schedules deletion on one of the good replicas. > I don't know if this is a dataloss bug in the case of 1 corrupt replica with > dfs.replication=2, but it seems feasible. I will attach a debug log with some > commentary marked by '>', plus a unit test patch which I can get > to reproduce this behavior reliably. (it's not a proper unit test, just some > edits to an existing one to show it) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira