[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734114#action_12734114
 ] 

dhruba borthakur commented on HDFS-435:
---

Very cool stuff! And the guide is very helpful. I have some questions from the 
user gide.

{quote}
   pointcut callReceivePacket() : 
call (* OutputStream.write(..)) 
 withincode (* BlockReceiver.receivePacket(..)) 
// to further limit the application of this aspect a very narrow 
'target' can be used as follows 
//   target(DataOutputStream) 
 !within(BlockReceiverAspects +); 

   {quote}

Can you pl explain the above line in detail, what it means, etc. Things like 
pointcut, withincode, are these aspectJ constructs? what is the intention 
of the above line? Thanks.


 Add orthogonal fault injection mechanism/framework
 --

 Key: HDFS-435
 URL: https://issues.apache.org/jira/browse/HDFS-435
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Attachments: Fault injection development guide and Framework HowTo.pdf


 It'd be great to have a fault injection mechanism for Hadoop.
 Having such solution in place will allow to increase test coverage of error 
 handling and recovery mechanisms, reduce reproduction time and increase the 
 reproduction rate of the problems.
 Ideally, the system has to be orthogonal to the current code and test base. 
 E.g. faults have to be injected at build time and would have to be 
 configurable, e.g. all faults could be turned off, or only some of them would 
 be allowed to happen. Also, fault injection has to be separated from 
 production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-496) Use PureJavaCrc32 in HDFS

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734121#action_12734121
 ] 

dhruba borthakur commented on HDFS-496:
---

For the records. the PureJavaCrc32 computes the same CRC value as the current 
one. So, this patch doe snot change HDFS data format. Can you pl link this with 
the one in the common project, because that JIRA has the performance numbers.

 Use PureJavaCrc32 in HDFS
 -

 Key: HDFS-496
 URL: https://issues.apache.org/jira/browse/HDFS-496
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-496.txt


 Common now has a pure java CRC32 implementation which is more efficient than 
 java.util.zip.CRC32. This issue is to make use of it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734125#action_12734125
 ] 

dhruba borthakur commented on HDFS-200:
---

Hi Ruyue, your option of excluding specific datanodes (specified by the client) 
sounds reasonable. This might help in the case of network partitioning where a 
specific client loses access to a set of datanodes while the datanode is alive 
and well and is able to send heartbeats to the namenode. Can you pl create a 
separate JIRA for your prosposed fix and attach your patch there? Thanks.

 In HDFS, sync() not yet guarantees data available to the new readers
 

 Key: HDFS-200
 URL: https://issues.apache.org/jira/browse/HDFS-200
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Tsz Wo (Nicholas), SZE
Assignee: dhruba borthakur
Priority: Blocker
 Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
 fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
 fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, 
 fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, 
 fsyncConcurrentReaders9.patch, 
 hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
 hypertable-namenode.log.gz, namenode.log, namenode.log, Reader.java, 
 Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java


 In the append design doc 
 (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
 says
 * A reader is guaranteed to be able to read data that was 'flushed' before 
 the reader opened the file
 However, this feature is not yet implemented.  Note that the operation 
 'flushed' is now called sync.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework

2009-07-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734205#action_12734205
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-435:
-

Yes, the guide is very useful for aop test development.  We should check in the 
doc.

Dhruba, where should we put the doc?  Any idea?

 Add orthogonal fault injection mechanism/framework
 --

 Key: HDFS-435
 URL: https://issues.apache.org/jira/browse/HDFS-435
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Attachments: Fault injection development guide and Framework 
 HowTo.pdf, Fault injection development guide and Framework HowTo.pdf


 It'd be great to have a fault injection mechanism for Hadoop.
 Having such solution in place will allow to increase test coverage of error 
 handling and recovery mechanisms, reduce reproduction time and increase the 
 reproduction rate of the problems.
 Ideally, the system has to be orthogonal to the current code and test base. 
 E.g. faults have to be injected at build time and would have to be 
 configurable, e.g. all faults could be turned off, or only some of them would 
 be allowed to happen. Also, fault injection has to be separated from 
 production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-496) Use PureJavaCrc32 in HDFS

2009-07-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-496:


 Component/s: hdfs client
  data-node
Hadoop Flags: [Reviewed]

+1 patch looks good.

 Use PureJavaCrc32 in HDFS
 -

 Key: HDFS-496
 URL: https://issues.apache.org/jira/browse/HDFS-496
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-496.txt


 Common now has a pure java CRC32 implementation which is more efficient than 
 java.util.zip.CRC32. This issue is to make use of it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-265) Revisit append

2009-07-22 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734235#action_12734235
 ] 

Hairong Kuang commented on HDFS-265:


In this design, a new generation stamp is always fetched from NameNode before a 
new pipeline is set up when handling errors. So if an access token is also 
fetched along with the generation stamp, things should be OK.

 Revisit append
 --

 Key: HDFS-265
 URL: https://issues.apache.org/jira/browse/HDFS-265
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: appendDesign.pdf, appendDesign.pdf, appendDesign1.pdf, 
 AppendSpec.pdf, TestPlanAppend.html


 HADOOP-1700 and related issues have put a lot of efforts to provide the first 
 implementation of append. However, append is such a complex feature. It turns 
 out that there are issues that were initially seemed trivial but needs a 
 careful design. This jira revisits append, aiming for a design and 
 implementation supporting a semantics that are acceptable to its users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread Bill Zeller (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734265#action_12734265
 ] 

Bill Zeller commented on HDFS-167:
--

The offending code:

{quote}
if (--retries == 0  
!NotReplicatedYetException.class.getName().
equals(e.getClassName())) {
throw e;
}
{quote}

This code attempts to retry until the above condition is met. The above 
condition says to {{throw e}} if the number of retries is 0 and the exception 
thrown is not a {{NotReplicatedYetException}}. However, the code later assumes 
that any exception not thrown is a {{NotReplicatedYetException}}. The intent 
seems to be to retry a certain number of times if a NotReplicatedYetException 
is thrown and to throw any other type of exception. The {{}} in the if 
statement should be changed to an {{||}}.

 DFSClient continues to retry indefinitely
 -

 Key: HDFS-167
 URL: https://issues.apache.org/jira/browse/HDFS-167
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Derek Wollenstein
Priority: Minor

 I encountered a bug when trying to upload data using the Hadoop DFS Client.  
 After receiving a NotReplicatedYetException, the DFSClient will normally 
 retry its upload up to some limited number of times.  In this case, I found 
 that this retry loop continued indefinitely, to the point that the number of 
 tries remaining was negative:
 2009-03-25 16:20:02 [INFO] 
 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
 replication for 21 seconds
 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
 NotReplicatedYetException sleeping 
 /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
 0325_us/logs_20090325_us_13 retries left -1
 The stack trace for the failure that's retrying is:
 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.server.namenode.NotReplicated
 YetException: Not replicated yet:filename
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 2009-03-25 16:20:02 [INFO]  at 
 java.lang.reflect.Method.invoke(Method.java:597)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
 2009-03-25 16:20:02 [INFO] 
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.Client.call(Client.java:697)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 2009-03-25 16:20:02 [INFO]  at 
 java.lang.reflect.Method.invoke(Method.java:597)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread Bill Zeller (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734275#action_12734275
 ] 

Bill Zeller commented on HDFS-167:
--

The above code should be:
{code:title=org.apache.hadoop.hdfs.DFSClient::locateFollowingBlock|borderStyle=solid}
if (--retries == 0  
!NotReplicatedYetException.class.getName().
equals(e.getClassName())) {
throw e;
}
{code} 

(Sorry about the repost)

 DFSClient continues to retry indefinitely
 -

 Key: HDFS-167
 URL: https://issues.apache.org/jira/browse/HDFS-167
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Derek Wollenstein
Priority: Minor

 I encountered a bug when trying to upload data using the Hadoop DFS Client.  
 After receiving a NotReplicatedYetException, the DFSClient will normally 
 retry its upload up to some limited number of times.  In this case, I found 
 that this retry loop continued indefinitely, to the point that the number of 
 tries remaining was negative:
 2009-03-25 16:20:02 [INFO] 
 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
 replication for 21 seconds
 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
 NotReplicatedYetException sleeping 
 /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
 0325_us/logs_20090325_us_13 retries left -1
 The stack trace for the failure that's retrying is:
 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.server.namenode.NotReplicated
 YetException: Not replicated yet:filename
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 2009-03-25 16:20:02 [INFO]  at 
 java.lang.reflect.Method.invoke(Method.java:597)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
 2009-03-25 16:20:02 [INFO] 
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.Client.call(Client.java:697)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 2009-03-25 16:20:02 [INFO]  at 
 java.lang.reflect.Method.invoke(Method.java:597)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734307#action_12734307
 ] 

dhruba borthakur commented on HDFS-167:
---

Hi Bill, will it be possible for you to submit this as a patch and a unit test? 
Details are here : http://wiki.apache.org/hadoop/HowToContribute

 DFSClient continues to retry indefinitely
 -

 Key: HDFS-167
 URL: https://issues.apache.org/jira/browse/HDFS-167
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Derek Wollenstein
Priority: Minor

 I encountered a bug when trying to upload data using the Hadoop DFS Client.  
 After receiving a NotReplicatedYetException, the DFSClient will normally 
 retry its upload up to some limited number of times.  In this case, I found 
 that this retry loop continued indefinitely, to the point that the number of 
 tries remaining was negative:
 2009-03-25 16:20:02 [INFO] 
 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
 replication for 21 seconds
 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
 NotReplicatedYetException sleeping 
 /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
 0325_us/logs_20090325_us_13 retries left -1
 The stack trace for the failure that's retrying is:
 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.server.namenode.NotReplicated
 YetException: Not replicated yet:filename
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 2009-03-25 16:20:02 [INFO]  at 
 java.lang.reflect.Method.invoke(Method.java:597)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
 2009-03-25 16:20:02 [INFO] 
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.Client.call(Client.java:697)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 2009-03-25 16:20:02 [INFO]  at 
 java.lang.reflect.Method.invoke(Method.java:597)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-492) Expose corrupt replica/block information

2009-07-22 Thread Bill Zeller (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Zeller updated HDFS-492:
-

Fix Version/s: 0.21.0
   Status: Patch Available  (was: Open)

 Expose corrupt replica/block information
 

 Key: HDFS-492
 URL: https://issues.apache.org/jira/browse/HDFS-492
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Affects Versions: 0.21.0
Reporter: Bill Zeller
Priority: Minor
 Fix For: 0.21.0

 Attachments: hdfs-492-4.patch, hdfs-492-5.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 This adds two additional functions to FSNamesystem to provide more 
 information about corrupt replicas. It also adds two servlets to the namenode 
 that provide information (in JSON) about all blocks with corrupt replicas as 
 well as information about a specific block. It also changes the file browsing 
 servlet by adding a link from block ids to the above mentioned block 
 information page.
 These JSON pages are designed to be used by client side tools which wish to 
 analyze corrupt block/replicas. The only change to an existing (non-servlet) 
 class is described below.  
 Currently, CorruptReplicasMap stores a map of corrupt replica information and 
 allows insertion and deletion. It also gives information about the corrupt 
 replicas for a specific block. It does not allow iteration over all corrupt 
 blocks. Two additional functions will be added to FSNamesystem (which will 
 call BlockManager which will call CorruptReplicasMap). The first will return 
 the size of the corrupt replicas map, which represents the number of blocks 
 that have corrupt replicas (but less than the number of corrupt replicas if a 
 block has multiple corrupt replicas). The second will allow paging through 
 a list of block ids that contain corrupt replicas:
 {{public synchronized ListLong getCorruptReplicaBlockIds(int n, Long 
 startingBlockId)}}
 {{n}} is the number of block ids to return and {{startingBlockId}} is the 
 block id offset. To prevent a large number of items being returned at one 
 time, n is constrained to 0 = {{n}} = 100. If {{startingBlockId}} is null, 
 up to {{n}} items are returned starting at the beginning of the list. 
 Ordering is enforced through the internal use of TreeMap in 
 CorruptReplicasMap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur reassigned HDFS-167:
-

Assignee: Bill Zeller

 DFSClient continues to retry indefinitely
 -

 Key: HDFS-167
 URL: https://issues.apache.org/jira/browse/HDFS-167
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Derek Wollenstein
Assignee: Bill Zeller
Priority: Minor

 I encountered a bug when trying to upload data using the Hadoop DFS Client.  
 After receiving a NotReplicatedYetException, the DFSClient will normally 
 retry its upload up to some limited number of times.  In this case, I found 
 that this retry loop continued indefinitely, to the point that the number of 
 tries remaining was negative:
 2009-03-25 16:20:02 [INFO] 
 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
 replication for 21 seconds
 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
 NotReplicatedYetException sleeping 
 /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
 0325_us/logs_20090325_us_13 retries left -1
 The stack trace for the failure that's retrying is:
 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.server.namenode.NotReplicated
 YetException: Not replicated yet:filename
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 2009-03-25 16:20:02 [INFO]  at 
 java.lang.reflect.Method.invoke(Method.java:597)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
 2009-03-25 16:20:02 [INFO] 
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.Client.call(Client.java:697)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 2009-03-25 16:20:02 [INFO]  at 
 java.lang.reflect.Method.invoke(Method.java:597)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
 2009-03-25 16:20:02 [INFO]  at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-492) Expose corrupt replica/block information

2009-07-22 Thread Bill Zeller (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Zeller reassigned HDFS-492:


Assignee: Bill Zeller

 Expose corrupt replica/block information
 

 Key: HDFS-492
 URL: https://issues.apache.org/jira/browse/HDFS-492
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Affects Versions: 0.21.0
Reporter: Bill Zeller
Assignee: Bill Zeller
Priority: Minor
 Fix For: 0.21.0

 Attachments: hdfs-492-4.patch, hdfs-492-5.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 This adds two additional functions to FSNamesystem to provide more 
 information about corrupt replicas. It also adds two servlets to the namenode 
 that provide information (in JSON) about all blocks with corrupt replicas as 
 well as information about a specific block. It also changes the file browsing 
 servlet by adding a link from block ids to the above mentioned block 
 information page.
 These JSON pages are designed to be used by client side tools which wish to 
 analyze corrupt block/replicas. The only change to an existing (non-servlet) 
 class is described below.  
 Currently, CorruptReplicasMap stores a map of corrupt replica information and 
 allows insertion and deletion. It also gives information about the corrupt 
 replicas for a specific block. It does not allow iteration over all corrupt 
 blocks. Two additional functions will be added to FSNamesystem (which will 
 call BlockManager which will call CorruptReplicasMap). The first will return 
 the size of the corrupt replicas map, which represents the number of blocks 
 that have corrupt replicas (but less than the number of corrupt replicas if a 
 block has multiple corrupt replicas). The second will allow paging through 
 a list of block ids that contain corrupt replicas:
 {{public synchronized ListLong getCorruptReplicaBlockIds(int n, Long 
 startingBlockId)}}
 {{n}} is the number of block ids to return and {{startingBlockId}} is the 
 block id offset. To prevent a large number of items being returned at one 
 time, n is constrained to 0 = {{n}} = 100. If {{startingBlockId}} is null, 
 up to {{n}} items are returned starting at the beginning of the list. 
 Ordering is enforced through the internal use of TreeMap in 
 CorruptReplicasMap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.