[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework
[ https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734114#action_12734114 ] dhruba borthakur commented on HDFS-435: --- Very cool stuff! And the guide is very helpful. I have some questions from the user gide. {quote} pointcut callReceivePacket() : call (* OutputStream.write(..)) withincode (* BlockReceiver.receivePacket(..)) // to further limit the application of this aspect a very narrow 'target' can be used as follows // target(DataOutputStream) !within(BlockReceiverAspects +); {quote} Can you pl explain the above line in detail, what it means, etc. Things like pointcut, withincode, are these aspectJ constructs? what is the intention of the above line? Thanks. Add orthogonal fault injection mechanism/framework -- Key: HDFS-435 URL: https://issues.apache.org/jira/browse/HDFS-435 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Attachments: Fault injection development guide and Framework HowTo.pdf It'd be great to have a fault injection mechanism for Hadoop. Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems. Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-496) Use PureJavaCrc32 in HDFS
[ https://issues.apache.org/jira/browse/HDFS-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734121#action_12734121 ] dhruba borthakur commented on HDFS-496: --- For the records. the PureJavaCrc32 computes the same CRC value as the current one. So, this patch doe snot change HDFS data format. Can you pl link this with the one in the common project, because that JIRA has the performance numbers. Use PureJavaCrc32 in HDFS - Key: HDFS-496 URL: https://issues.apache.org/jira/browse/HDFS-496 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-496.txt Common now has a pure java CRC32 implementation which is more efficient than java.util.zip.CRC32. This issue is to make use of it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers
[ https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734125#action_12734125 ] dhruba borthakur commented on HDFS-200: --- Hi Ruyue, your option of excluding specific datanodes (specified by the client) sounds reasonable. This might help in the case of network partitioning where a specific client loses access to a set of datanodes while the datanode is alive and well and is able to send heartbeats to the namenode. Can you pl create a separate JIRA for your prosposed fix and attach your patch there? Thanks. In HDFS, sync() not yet guarantees data available to the new readers Key: HDFS-200 URL: https://issues.apache.org/jira/browse/HDFS-200 Project: Hadoop HDFS Issue Type: New Feature Reporter: Tsz Wo (Nicholas), SZE Assignee: dhruba borthakur Priority: Blocker Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, hypertable-namenode.log.gz, namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it says * A reader is guaranteed to be able to read data that was 'flushed' before the reader opened the file However, this feature is not yet implemented. Note that the operation 'flushed' is now called sync. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework
[ https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734205#action_12734205 ] Tsz Wo (Nicholas), SZE commented on HDFS-435: - Yes, the guide is very useful for aop test development. We should check in the doc. Dhruba, where should we put the doc? Any idea? Add orthogonal fault injection mechanism/framework -- Key: HDFS-435 URL: https://issues.apache.org/jira/browse/HDFS-435 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Attachments: Fault injection development guide and Framework HowTo.pdf, Fault injection development guide and Framework HowTo.pdf It'd be great to have a fault injection mechanism for Hadoop. Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems. Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-496) Use PureJavaCrc32 in HDFS
[ https://issues.apache.org/jira/browse/HDFS-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-496: Component/s: hdfs client data-node Hadoop Flags: [Reviewed] +1 patch looks good. Use PureJavaCrc32 in HDFS - Key: HDFS-496 URL: https://issues.apache.org/jira/browse/HDFS-496 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-496.txt Common now has a pure java CRC32 implementation which is more efficient than java.util.zip.CRC32. This issue is to make use of it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-265) Revisit append
[ https://issues.apache.org/jira/browse/HDFS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734235#action_12734235 ] Hairong Kuang commented on HDFS-265: In this design, a new generation stamp is always fetched from NameNode before a new pipeline is set up when handling errors. So if an access token is also fetched along with the generation stamp, things should be OK. Revisit append -- Key: HDFS-265 URL: https://issues.apache.org/jira/browse/HDFS-265 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Attachments: appendDesign.pdf, appendDesign.pdf, appendDesign1.pdf, AppendSpec.pdf, TestPlanAppend.html HADOOP-1700 and related issues have put a lot of efforts to provide the first implementation of append. However, append is such a complex feature. It turns out that there are issues that were initially seemed trivial but needs a careful design. This jira revisits append, aiming for a design and implementation supporting a semantics that are acceptable to its users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely
[ https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734265#action_12734265 ] Bill Zeller commented on HDFS-167: -- The offending code: {quote} if (--retries == 0 !NotReplicatedYetException.class.getName(). equals(e.getClassName())) { throw e; } {quote} This code attempts to retry until the above condition is met. The above condition says to {{throw e}} if the number of retries is 0 and the exception thrown is not a {{NotReplicatedYetException}}. However, the code later assumes that any exception not thrown is a {{NotReplicatedYetException}}. The intent seems to be to retry a certain number of times if a NotReplicatedYetException is thrown and to throw any other type of exception. The {{}} in the if statement should be changed to an {{||}}. DFSClient continues to retry indefinitely - Key: HDFS-167 URL: https://issues.apache.org/jira/browse/HDFS-167 Project: Hadoop HDFS Issue Type: Bug Reporter: Derek Wollenstein Priority: Minor I encountered a bug when trying to upload data using the Hadoop DFS Client. After receiving a NotReplicatedYetException, the DFSClient will normally retry its upload up to some limited number of times. In this case, I found that this retry loop continued indefinitely, to the point that the number of tries remaining was negative: 2009-03-25 16:20:02 [INFO] 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for replication for 21 seconds 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009 0325_us/logs_20090325_us_13 retries left -1 The stack trace for the failure that's retrying is: 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicated YetException: Not replicated yet:filename 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) 2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-03-25 16:20:02 [INFO] 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Client.call(Client.java:697) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) 2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) 2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely
[ https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734275#action_12734275 ] Bill Zeller commented on HDFS-167: -- The above code should be: {code:title=org.apache.hadoop.hdfs.DFSClient::locateFollowingBlock|borderStyle=solid} if (--retries == 0 !NotReplicatedYetException.class.getName(). equals(e.getClassName())) { throw e; } {code} (Sorry about the repost) DFSClient continues to retry indefinitely - Key: HDFS-167 URL: https://issues.apache.org/jira/browse/HDFS-167 Project: Hadoop HDFS Issue Type: Bug Reporter: Derek Wollenstein Priority: Minor I encountered a bug when trying to upload data using the Hadoop DFS Client. After receiving a NotReplicatedYetException, the DFSClient will normally retry its upload up to some limited number of times. In this case, I found that this retry loop continued indefinitely, to the point that the number of tries remaining was negative: 2009-03-25 16:20:02 [INFO] 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for replication for 21 seconds 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009 0325_us/logs_20090325_us_13 retries left -1 The stack trace for the failure that's retrying is: 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicated YetException: Not replicated yet:filename 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) 2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-03-25 16:20:02 [INFO] 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Client.call(Client.java:697) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) 2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) 2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely
[ https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734307#action_12734307 ] dhruba borthakur commented on HDFS-167: --- Hi Bill, will it be possible for you to submit this as a patch and a unit test? Details are here : http://wiki.apache.org/hadoop/HowToContribute DFSClient continues to retry indefinitely - Key: HDFS-167 URL: https://issues.apache.org/jira/browse/HDFS-167 Project: Hadoop HDFS Issue Type: Bug Reporter: Derek Wollenstein Priority: Minor I encountered a bug when trying to upload data using the Hadoop DFS Client. After receiving a NotReplicatedYetException, the DFSClient will normally retry its upload up to some limited number of times. In this case, I found that this retry loop continued indefinitely, to the point that the number of tries remaining was negative: 2009-03-25 16:20:02 [INFO] 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for replication for 21 seconds 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009 0325_us/logs_20090325_us_13 retries left -1 The stack trace for the failure that's retrying is: 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicated YetException: Not replicated yet:filename 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) 2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-03-25 16:20:02 [INFO] 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Client.call(Client.java:697) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) 2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) 2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-492) Expose corrupt replica/block information
[ https://issues.apache.org/jira/browse/HDFS-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Zeller updated HDFS-492: - Fix Version/s: 0.21.0 Status: Patch Available (was: Open) Expose corrupt replica/block information Key: HDFS-492 URL: https://issues.apache.org/jira/browse/HDFS-492 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Affects Versions: 0.21.0 Reporter: Bill Zeller Priority: Minor Fix For: 0.21.0 Attachments: hdfs-492-4.patch, hdfs-492-5.patch Original Estimate: 48h Remaining Estimate: 48h This adds two additional functions to FSNamesystem to provide more information about corrupt replicas. It also adds two servlets to the namenode that provide information (in JSON) about all blocks with corrupt replicas as well as information about a specific block. It also changes the file browsing servlet by adding a link from block ids to the above mentioned block information page. These JSON pages are designed to be used by client side tools which wish to analyze corrupt block/replicas. The only change to an existing (non-servlet) class is described below. Currently, CorruptReplicasMap stores a map of corrupt replica information and allows insertion and deletion. It also gives information about the corrupt replicas for a specific block. It does not allow iteration over all corrupt blocks. Two additional functions will be added to FSNamesystem (which will call BlockManager which will call CorruptReplicasMap). The first will return the size of the corrupt replicas map, which represents the number of blocks that have corrupt replicas (but less than the number of corrupt replicas if a block has multiple corrupt replicas). The second will allow paging through a list of block ids that contain corrupt replicas: {{public synchronized ListLong getCorruptReplicaBlockIds(int n, Long startingBlockId)}} {{n}} is the number of block ids to return and {{startingBlockId}} is the block id offset. To prevent a large number of items being returned at one time, n is constrained to 0 = {{n}} = 100. If {{startingBlockId}} is null, up to {{n}} items are returned starting at the beginning of the list. Ordering is enforced through the internal use of TreeMap in CorruptReplicasMap. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HDFS-167) DFSClient continues to retry indefinitely
[ https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur reassigned HDFS-167: - Assignee: Bill Zeller DFSClient continues to retry indefinitely - Key: HDFS-167 URL: https://issues.apache.org/jira/browse/HDFS-167 Project: Hadoop HDFS Issue Type: Bug Reporter: Derek Wollenstein Assignee: Bill Zeller Priority: Minor I encountered a bug when trying to upload data using the Hadoop DFS Client. After receiving a NotReplicatedYetException, the DFSClient will normally retry its upload up to some limited number of times. In this case, I found that this retry loop continued indefinitely, to the point that the number of tries remaining was negative: 2009-03-25 16:20:02 [INFO] 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for replication for 21 seconds 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009 0325_us/logs_20090325_us_13 retries left -1 The stack trace for the failure that's retrying is: 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicated YetException: Not replicated yet:filename 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) 2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-03-25 16:20:02 [INFO] 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Client.call(Client.java:697) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) 2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) 2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) 2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) 2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HDFS-492) Expose corrupt replica/block information
[ https://issues.apache.org/jira/browse/HDFS-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Zeller reassigned HDFS-492: Assignee: Bill Zeller Expose corrupt replica/block information Key: HDFS-492 URL: https://issues.apache.org/jira/browse/HDFS-492 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Affects Versions: 0.21.0 Reporter: Bill Zeller Assignee: Bill Zeller Priority: Minor Fix For: 0.21.0 Attachments: hdfs-492-4.patch, hdfs-492-5.patch Original Estimate: 48h Remaining Estimate: 48h This adds two additional functions to FSNamesystem to provide more information about corrupt replicas. It also adds two servlets to the namenode that provide information (in JSON) about all blocks with corrupt replicas as well as information about a specific block. It also changes the file browsing servlet by adding a link from block ids to the above mentioned block information page. These JSON pages are designed to be used by client side tools which wish to analyze corrupt block/replicas. The only change to an existing (non-servlet) class is described below. Currently, CorruptReplicasMap stores a map of corrupt replica information and allows insertion and deletion. It also gives information about the corrupt replicas for a specific block. It does not allow iteration over all corrupt blocks. Two additional functions will be added to FSNamesystem (which will call BlockManager which will call CorruptReplicasMap). The first will return the size of the corrupt replicas map, which represents the number of blocks that have corrupt replicas (but less than the number of corrupt replicas if a block has multiple corrupt replicas). The second will allow paging through a list of block ids that contain corrupt replicas: {{public synchronized ListLong getCorruptReplicaBlockIds(int n, Long startingBlockId)}} {{n}} is the number of block ids to return and {{startingBlockId}} is the block id offset. To prevent a large number of items being returned at one time, n is constrained to 0 = {{n}} = 100. If {{startingBlockId}} is null, up to {{n}} items are returned starting at the beginning of the list. Ordering is enforced through the internal use of TreeMap in CorruptReplicasMap. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.