[jira] Assigned: (HDFS-492) Expose corrupt replica/block information

2009-07-22 Thread Bill Zeller (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Zeller reassigned HDFS-492:


Assignee: Bill Zeller

> Expose corrupt replica/block information
> 
>
> Key: HDFS-492
> URL: https://issues.apache.org/jira/browse/HDFS-492
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Affects Versions: 0.21.0
>Reporter: Bill Zeller
>Assignee: Bill Zeller
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: hdfs-492-4.patch, hdfs-492-5.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This adds two additional functions to FSNamesystem to provide more 
> information about corrupt replicas. It also adds two servlets to the namenode 
> that provide information (in JSON) about all blocks with corrupt replicas as 
> well as information about a specific block. It also changes the file browsing 
> servlet by adding a link from block ids to the above mentioned block 
> information page.
> These JSON pages are designed to be used by client side tools which wish to 
> analyze corrupt block/replicas. The only change to an existing (non-servlet) 
> class is described below.  
> Currently, CorruptReplicasMap stores a map of corrupt replica information and 
> allows insertion and deletion. It also gives information about the corrupt 
> replicas for a specific block. It does not allow iteration over all corrupt 
> blocks. Two additional functions will be added to FSNamesystem (which will 
> call BlockManager which will call CorruptReplicasMap). The first will return 
> the size of the corrupt replicas map, which represents the number of blocks 
> that have corrupt replicas (but less than the number of corrupt replicas if a 
> block has multiple corrupt replicas). The second will allow "paging" through 
> a list of block ids that contain corrupt replicas:
> {{public synchronized List getCorruptReplicaBlockIds(int n, Long 
> startingBlockId)}}
> {{n}} is the number of block ids to return and {{startingBlockId}} is the 
> block id offset. To prevent a large number of items being returned at one 
> time, n is constrained to 0 <= {{n}} <= 100. If {{startingBlockId}} is null, 
> up to {{n}} items are returned starting at the beginning of the list. 
> Ordering is enforced through the internal use of TreeMap in 
> CorruptReplicasMap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur reassigned HDFS-167:
-

Assignee: Bill Zeller

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Assignee: Bill Zeller
>Priority: Minor
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-492) Expose corrupt replica/block information

2009-07-22 Thread Bill Zeller (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Zeller updated HDFS-492:
-

Status: Open  (was: Patch Available)

> Expose corrupt replica/block information
> 
>
> Key: HDFS-492
> URL: https://issues.apache.org/jira/browse/HDFS-492
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Affects Versions: 0.21.0
>Reporter: Bill Zeller
>Priority: Minor
> Attachments: hdfs-492-4.patch, hdfs-492-5.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This adds two additional functions to FSNamesystem to provide more 
> information about corrupt replicas. It also adds two servlets to the namenode 
> that provide information (in JSON) about all blocks with corrupt replicas as 
> well as information about a specific block. It also changes the file browsing 
> servlet by adding a link from block ids to the above mentioned block 
> information page.
> These JSON pages are designed to be used by client side tools which wish to 
> analyze corrupt block/replicas. The only change to an existing (non-servlet) 
> class is described below.  
> Currently, CorruptReplicasMap stores a map of corrupt replica information and 
> allows insertion and deletion. It also gives information about the corrupt 
> replicas for a specific block. It does not allow iteration over all corrupt 
> blocks. Two additional functions will be added to FSNamesystem (which will 
> call BlockManager which will call CorruptReplicasMap). The first will return 
> the size of the corrupt replicas map, which represents the number of blocks 
> that have corrupt replicas (but less than the number of corrupt replicas if a 
> block has multiple corrupt replicas). The second will allow "paging" through 
> a list of block ids that contain corrupt replicas:
> {{public synchronized List getCorruptReplicaBlockIds(int n, Long 
> startingBlockId)}}
> {{n}} is the number of block ids to return and {{startingBlockId}} is the 
> block id offset. To prevent a large number of items being returned at one 
> time, n is constrained to 0 <= {{n}} <= 100. If {{startingBlockId}} is null, 
> up to {{n}} items are returned starting at the beginning of the list. 
> Ordering is enforced through the internal use of TreeMap in 
> CorruptReplicasMap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-492) Expose corrupt replica/block information

2009-07-22 Thread Bill Zeller (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Zeller updated HDFS-492:
-

Fix Version/s: 0.21.0
   Status: Patch Available  (was: Open)

> Expose corrupt replica/block information
> 
>
> Key: HDFS-492
> URL: https://issues.apache.org/jira/browse/HDFS-492
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Affects Versions: 0.21.0
>Reporter: Bill Zeller
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: hdfs-492-4.patch, hdfs-492-5.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This adds two additional functions to FSNamesystem to provide more 
> information about corrupt replicas. It also adds two servlets to the namenode 
> that provide information (in JSON) about all blocks with corrupt replicas as 
> well as information about a specific block. It also changes the file browsing 
> servlet by adding a link from block ids to the above mentioned block 
> information page.
> These JSON pages are designed to be used by client side tools which wish to 
> analyze corrupt block/replicas. The only change to an existing (non-servlet) 
> class is described below.  
> Currently, CorruptReplicasMap stores a map of corrupt replica information and 
> allows insertion and deletion. It also gives information about the corrupt 
> replicas for a specific block. It does not allow iteration over all corrupt 
> blocks. Two additional functions will be added to FSNamesystem (which will 
> call BlockManager which will call CorruptReplicasMap). The first will return 
> the size of the corrupt replicas map, which represents the number of blocks 
> that have corrupt replicas (but less than the number of corrupt replicas if a 
> block has multiple corrupt replicas). The second will allow "paging" through 
> a list of block ids that contain corrupt replicas:
> {{public synchronized List getCorruptReplicaBlockIds(int n, Long 
> startingBlockId)}}
> {{n}} is the number of block ids to return and {{startingBlockId}} is the 
> block id offset. To prevent a large number of items being returned at one 
> time, n is constrained to 0 <= {{n}} <= 100. If {{startingBlockId}} is null, 
> up to {{n}} items are returned starting at the beginning of the list. 
> Ordering is enforced through the internal use of TreeMap in 
> CorruptReplicasMap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-492) Expose corrupt replica/block information

2009-07-22 Thread Bill Zeller (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Zeller updated HDFS-492:
-

Attachment: hdfs-492-5.patch

> Expose corrupt replica/block information
> 
>
> Key: HDFS-492
> URL: https://issues.apache.org/jira/browse/HDFS-492
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Affects Versions: 0.21.0
>Reporter: Bill Zeller
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: hdfs-492-4.patch, hdfs-492-5.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This adds two additional functions to FSNamesystem to provide more 
> information about corrupt replicas. It also adds two servlets to the namenode 
> that provide information (in JSON) about all blocks with corrupt replicas as 
> well as information about a specific block. It also changes the file browsing 
> servlet by adding a link from block ids to the above mentioned block 
> information page.
> These JSON pages are designed to be used by client side tools which wish to 
> analyze corrupt block/replicas. The only change to an existing (non-servlet) 
> class is described below.  
> Currently, CorruptReplicasMap stores a map of corrupt replica information and 
> allows insertion and deletion. It also gives information about the corrupt 
> replicas for a specific block. It does not allow iteration over all corrupt 
> blocks. Two additional functions will be added to FSNamesystem (which will 
> call BlockManager which will call CorruptReplicasMap). The first will return 
> the size of the corrupt replicas map, which represents the number of blocks 
> that have corrupt replicas (but less than the number of corrupt replicas if a 
> block has multiple corrupt replicas). The second will allow "paging" through 
> a list of block ids that contain corrupt replicas:
> {{public synchronized List getCorruptReplicaBlockIds(int n, Long 
> startingBlockId)}}
> {{n}} is the number of block ids to return and {{startingBlockId}} is the 
> block id offset. To prevent a large number of items being returned at one 
> time, n is constrained to 0 <= {{n}} <= 100. If {{startingBlockId}} is null, 
> up to {{n}} items are returned starting at the beginning of the list. 
> Ordering is enforced through the internal use of TreeMap in 
> CorruptReplicasMap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread Bill Zeller (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734330#action_12734330
 ] 

Bill Zeller commented on HDFS-167:
--

Hi--Yes, I'll be submitting a patch for this. This issue is currently 
unassigned because I don't have permission to assign it to myself. There 
appears to be no easy way to test this through unit tests (without actually 
starting a namenode and datanode), so I'd like to modify the class to make 
direct unit testing easier. 

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Priority: Minor
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-498) Add development guide and framework documentation

2009-07-22 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-498:


Description: The framework documentation and aspects development guide is 
needed to be developed in Forrest xml and then checked under 
src/docs/src/documentation/content/xdocs

> Add development guide and framework documentation
> -
>
> Key: HDFS-498
> URL: https://issues.apache.org/jira/browse/HDFS-498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
>
> The framework documentation and aspects development guide is needed to be 
> developed in Forrest xml and then checked under 
> src/docs/src/documentation/content/xdocs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-498) Add development guide and framework documentation

2009-07-22 Thread Konstantin Boudnik (JIRA)
Add development guide and framework documentation
-

 Key: HDFS-498
 URL: https://issues.apache.org/jira/browse/HDFS-498
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework

2009-07-22 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734313#action_12734313
 ] 

Konstantin Boudnik commented on HDFS-435:
-

The PDF format has been chosen for a convenience of future readers. I think 
your suggestion is valid - I'll open a separate JIRA for this and will take 
care about the conversion.

> Add orthogonal fault injection mechanism/framework
> --
>
> Key: HDFS-435
> URL: https://issues.apache.org/jira/browse/HDFS-435
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: Fault injection development guide and Framework 
> HowTo.pdf, Fault injection development guide and Framework HowTo.pdf
>
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error 
> handling and recovery mechanisms, reduce reproduction time and increase the 
> reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. 
> E.g. faults have to be injected at build time and would have to be 
> configurable, e.g. all faults could be turned off, or only some of them would 
> be allowed to happen. Also, fault injection has to be separated from 
> production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734308#action_12734308
 ] 

dhruba borthakur commented on HDFS-435:
---

> Dhruba, where should we put the doc? Any idea?

Docs typically go into src/docs. But we want the doc in an editable and open 
format. The pdf format is not editable. One option is to convert it into 
forrest xml and check it into src/docs/src/documentation/content/xdocs. 


> Add orthogonal fault injection mechanism/framework
> --
>
> Key: HDFS-435
> URL: https://issues.apache.org/jira/browse/HDFS-435
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: Fault injection development guide and Framework 
> HowTo.pdf, Fault injection development guide and Framework HowTo.pdf
>
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error 
> handling and recovery mechanisms, reduce reproduction time and increase the 
> reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. 
> E.g. faults have to be injected at build time and would have to be 
> configurable, e.g. all faults could be turned off, or only some of them would 
> be allowed to happen. Also, fault injection has to be separated from 
> production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734307#action_12734307
 ] 

dhruba borthakur commented on HDFS-167:
---

Hi Bill, will it be possible for you to submit this as a patch and a unit test? 
Details are here : http://wiki.apache.org/hadoop/HowToContribute

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Priority: Minor
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread Bill Zeller (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734275#action_12734275
 ] 

Bill Zeller commented on HDFS-167:
--

The above code should be:
{code:title=org.apache.hadoop.hdfs.DFSClient::locateFollowingBlock|borderStyle=solid}
if (--retries == 0 && 
!NotReplicatedYetException.class.getName().
equals(e.getClassName())) {
throw e;
}
{code} 

(Sorry about the repost)

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Priority: Minor
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread Bill Zeller (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734265#action_12734265
 ] 

Bill Zeller commented on HDFS-167:
--

The offending code:

{quote}
if (--retries == 0 && 
!NotReplicatedYetException.class.getName().
equals(e.getClassName())) {
throw e;
}
{quote}

This code attempts to retry until the above condition is met. The above 
condition says to {{throw e}} if the number of retries is 0 and the exception 
thrown is not a {{NotReplicatedYetException}}. However, the code later assumes 
that any exception not thrown is a {{NotReplicatedYetException}}. The intent 
seems to be to retry a certain number of times if a NotReplicatedYetException 
is thrown and to throw any other type of exception. The {{&&}} in the if 
statement should be changed to an {{||}}.

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Priority: Minor
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-497) One of the DFSClient::create functions ignores parameter

2009-07-22 Thread Bill Zeller (JIRA)
One of the DFSClient::create functions ignores parameter


 Key: HDFS-497
 URL: https://issues.apache.org/jira/browse/HDFS-497
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.1
Reporter: Bill Zeller
Priority: Minor


DFSClient::create(String src, boolean overwrite, Progressable progress) ignores 
progress parameter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-265) Revisit append

2009-07-22 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734237#action_12734237
 ] 

Hairong Kuang commented on HDFS-265:


Thanks Cos and Nicholas for the test plan. All tests are very good. I have two 
comments:
1. Release 0.21 will focus more on hflush than append. Could we add tests on 
hflush?
2. Could we have some performance/scalability tests to make sure that the 
implementation does not cause any performance/scalability regression?

> Revisit append
> --
>
> Key: HDFS-265
> URL: https://issues.apache.org/jira/browse/HDFS-265
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Attachments: appendDesign.pdf, appendDesign.pdf, appendDesign1.pdf, 
> AppendSpec.pdf, TestPlanAppend.html
>
>
> HADOOP-1700 and related issues have put a lot of efforts to provide the first 
> implementation of append. However, append is such a complex feature. It turns 
> out that there are issues that were initially seemed trivial but needs a 
> careful design. This jira revisits append, aiming for a design and 
> implementation supporting a semantics that are acceptable to its users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-265) Revisit append

2009-07-22 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734235#action_12734235
 ] 

Hairong Kuang commented on HDFS-265:


In this design, a new generation stamp is always fetched from NameNode before a 
new pipeline is set up when handling errors. So if an access token is also 
fetched along with the generation stamp, things should be OK.

> Revisit append
> --
>
> Key: HDFS-265
> URL: https://issues.apache.org/jira/browse/HDFS-265
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Attachments: appendDesign.pdf, appendDesign.pdf, appendDesign1.pdf, 
> AppendSpec.pdf, TestPlanAppend.html
>
>
> HADOOP-1700 and related issues have put a lot of efforts to provide the first 
> implementation of append. However, append is such a complex feature. It turns 
> out that there are issues that were initially seemed trivial but needs a 
> careful design. This jira revisits append, aiming for a design and 
> implementation supporting a semantics that are acceptable to its users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-496) Use PureJavaCrc32 in HDFS

2009-07-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-496:


 Component/s: hdfs client
  data-node
Hadoop Flags: [Reviewed]

+1 patch looks good.

> Use PureJavaCrc32 in HDFS
> -
>
> Key: HDFS-496
> URL: https://issues.apache.org/jira/browse/HDFS-496
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-496.txt
>
>
> Common now has a pure java CRC32 implementation which is more efficient than 
> java.util.zip.CRC32. This issue is to make use of it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework

2009-07-22 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734211#action_12734211
 ] 

Konstantin Boudnik commented on HDFS-435:
-

I'd imagine that we can check in test related documents and examples into 
src/docs/test so that would be generated into Hdfs' documentation by Forrest.

Also, it'd be great to have a set of examples to demonstrate different 
techniques of fault injection development for tests. Nicholas' HDFS-483 is a 
great start! 

> Add orthogonal fault injection mechanism/framework
> --
>
> Key: HDFS-435
> URL: https://issues.apache.org/jira/browse/HDFS-435
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: Fault injection development guide and Framework 
> HowTo.pdf, Fault injection development guide and Framework HowTo.pdf
>
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error 
> handling and recovery mechanisms, reduce reproduction time and increase the 
> reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. 
> E.g. faults have to be injected at build time and would have to be 
> configurable, e.g. all faults could be turned off, or only some of them would 
> be allowed to happen. Also, fault injection has to be separated from 
> production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework

2009-07-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734205#action_12734205
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-435:
-

Yes, the guide is very useful for aop test development.  We should check in the 
doc.

Dhruba, where should we put the doc?  Any idea?

> Add orthogonal fault injection mechanism/framework
> --
>
> Key: HDFS-435
> URL: https://issues.apache.org/jira/browse/HDFS-435
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: Fault injection development guide and Framework 
> HowTo.pdf, Fault injection development guide and Framework HowTo.pdf
>
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error 
> handling and recovery mechanisms, reduce reproduction time and increase the 
> reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. 
> E.g. faults have to be injected at build time and would have to be 
> configurable, e.g. all faults could be turned off, or only some of them would 
> be allowed to happen. Also, fault injection has to be separated from 
> production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-435) Add orthogonal fault injection mechanism/framework

2009-07-22 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-435:


Attachment: Fault injection development guide and Framework HowTo.pdf

Document is updated with additional explanations for the meaning of an aspect's 
specifics such as pointcut and all.
Thank you for the review, Dhruba!

> Add orthogonal fault injection mechanism/framework
> --
>
> Key: HDFS-435
> URL: https://issues.apache.org/jira/browse/HDFS-435
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: Fault injection development guide and Framework 
> HowTo.pdf, Fault injection development guide and Framework HowTo.pdf
>
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error 
> handling and recovery mechanisms, reduce reproduction time and increase the 
> reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. 
> E.g. faults have to be injected at build time and would have to be 
> configurable, e.g. all faults could be turned off, or only some of them would 
> be allowed to happen. Also, fault injection has to be separated from 
> production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734125#action_12734125
 ] 

dhruba borthakur commented on HDFS-200:
---

Hi Ruyue, your option of excluding specific datanodes (specified by the client) 
sounds reasonable. This might help in the case of network partitioning where a 
specific client loses access to a set of datanodes while the datanode is alive 
and well and is able to send heartbeats to the namenode. Can you pl create a 
separate JIRA for your prosposed fix and attach your patch there? Thanks.

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, 
> fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, 
> fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hypertable-namenode.log.gz, namenode.log, namenode.log, Reader.java, 
> Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-496) Use PureJavaCrc32 in HDFS

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734121#action_12734121
 ] 

dhruba borthakur commented on HDFS-496:
---

For the records. the PureJavaCrc32 computes the same CRC value as the current 
one. So, this patch doe snot change HDFS data format. Can you pl link this with 
the one in the common project, because that JIRA has the performance numbers.

> Use PureJavaCrc32 in HDFS
> -
>
> Key: HDFS-496
> URL: https://issues.apache.org/jira/browse/HDFS-496
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-496.txt
>
>
> Common now has a pure java CRC32 implementation which is more efficient than 
> java.util.zip.CRC32. This issue is to make use of it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734114#action_12734114
 ] 

dhruba borthakur commented on HDFS-435:
---

Very cool stuff! And the guide is very helpful. I have some questions from the 
user gide.

{quote}
   pointcut callReceivePacket() : 
call (* OutputStream.write(..)) 
&& withincode (* BlockReceiver.receivePacket(..)) 
// to further limit the application of this aspect a very narrow 
'target' can be used as follows 
//  && target(DataOutputStream) 
&& !within(BlockReceiverAspects +); 

   {quote}

Can you pl explain the above line in detail, what it means, etc. Things like 
"pointcut", "withincode", are these aspectJ constructs? what is the intention 
of the above line? Thanks.


> Add orthogonal fault injection mechanism/framework
> --
>
> Key: HDFS-435
> URL: https://issues.apache.org/jira/browse/HDFS-435
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: Fault injection development guide and Framework HowTo.pdf
>
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error 
> handling and recovery mechanisms, reduce reproduction time and increase the 
> reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. 
> E.g. faults have to be injected at build time and would have to be 
> configurable, e.g. all faults could be turned off, or only some of them would 
> be allowed to happen. Also, fault injection has to be separated from 
> production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.