[jira] Updated: (HDFS-1536) Improve HDFS WebUI

2010-12-14 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1536:


Attachment: missingBlocksWebUI1.patch

MissingBlocksWebUI1.patch addressed Nigel's comments.

 Improve HDFS WebUI
 --

 Key: HDFS-1536
 URL: https://issues.apache.org/jira/browse/HDFS-1536
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: missingBlocksWebUI.patch, missingBlocksWebUI1.patch


 1. Make the missing blocks count accurate;
 2. Make the under replicated blocks count excluding missing blocks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1521) Persist transaction ID on disk between NN restarts

2010-12-14 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1521:
--

Attachment: hdfs-1521.3.txt

This patch switches to logging a txid for every edit and verifying strict 
sequential ordering on load.

I also left the txid in the header - it seemed to me this is advantageous just 
as something that *must* be there at the top of every edit file. If others 
disagree we can take it out.

Added some basic tests as well to ensure we can still read the old format.

 Persist transaction ID on disk between NN restarts
 --

 Key: HDFS-1521
 URL: https://issues.apache.org/jira/browse/HDFS-1521
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: hdfs-1521.3.txt, hdfs-1521.txt, hdfs-1521.txt


 For HDFS-1073 and other future work, we'd like to have the concept of a 
 transaction ID that is persisted on disk with the image/edits. We already 
 have this concept in the NameNode but it resets to 0 on restart. We can also 
 use this txid to replace the _checkpointTime_ field, I believe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1526) Dfs client name for a map/reduce task should have some randomness

2010-12-14 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1526:


  Resolution: Fixed
Release Note: Make a client name has this format: 
DFSClient_applicationid_randomint_threadid, where applicationid = 
mapred.task.id or else = NONMAPREDUCE.
Hadoop Flags: [Incompatible change, Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this!

 Dfs client name for a map/reduce task should have some randomness
 -

 Key: HDFS-1526
 URL: https://issues.apache.org/jira/browse/HDFS-1526
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: clientName.patch, randClientId1.patch, 
 randClientId2.patch, randClientId3.patch


 Fsck shows one of the files in our dfs cluster is corrupt.
 /bin/hadoop fsck aFile -files -blocks -locations
 aFile: 4633 bytes, 2 block(s): 
 aFile: CORRUPT block blk_-4597378336099313975
 OK
 0. blk_-4597378336099313975_2284630101 len=0 repl=3 [...]
 1. blk_5024052590403223424_2284630107 len=4633 repl=3 [...]Status: CORRUPT
 On disk, these two blocks are of the same size and the same content. It turns 
 out the writer of the file is from a multiple threaded map task. Each thread 
 may write to the same file. One possible interaction of two threads might 
 make this to happen:
 [T1: create aFile] [T2: delete aFile] [T2: create aFile][T1: addBlock 0 to 
 aFile][T2: addBlock1 to aFile]...
 Because T1 and T2 have the same client name, which is the map task id, the 
 above interactions could be done without any lease exception, thus eventually 
 leading to a corrupt file. To solve the problem, a mapreduce task's client 
 name could be formed by its task id followed by a random number.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1360) TestBlockRecovery should bind ephemeral ports

2010-12-14 Thread Patrick Kling (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971407#action_12971407
 ] 

Patrick Kling commented on HDFS-1360:
-

+1 I just tested this and it fixes the problem I was seeing.

 TestBlockRecovery should bind ephemeral ports
 -

 Key: HDFS-1360
 URL: https://issues.apache.org/jira/browse/HDFS-1360
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-1360.txt


 TestBlockRecovery starts up a DN, but doesn't configure the various ports to 
 be ephemeral, so the test fails if run on a machine where another DN is 
 already running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1476) listCorruptFileBlocks should be functional while the name node is still in safe mode

2010-12-14 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1476:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Patrick!

 listCorruptFileBlocks should be functional while the name node is still in 
 safe mode
 

 Key: HDFS-1476
 URL: https://issues.apache.org/jira/browse/HDFS-1476
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Patrick Kling
Assignee: Patrick Kling
 Fix For: 0.23.0

 Attachments: HDFS-1476.2.patch, HDFS-1476.3.patch, HDFS-1476.4.patch, 
 HDFS-1476.5.patch, HDFS-1476.patch


 This would allow us to detect whether missing blocks can be fixed using Raid 
 and if that is the case exit safe mode earlier.
 One way to make listCorruptFileBlocks available before the name node has 
 exited from safe mode would be to perform a scan of the blocks map on each 
 call to listCorruptFileBlocks to determine if there are any blocks with no 
 replicas. This scan could be parallelized by dividing the space of block IDs 
 into multiple intervals than can be scanned independently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1360) TestBlockRecovery should bind ephemeral ports

2010-12-14 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1360:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Todd!

 TestBlockRecovery should bind ephemeral ports
 -

 Key: HDFS-1360
 URL: https://issues.apache.org/jira/browse/HDFS-1360
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-1360.txt


 TestBlockRecovery starts up a DN, but doesn't configure the various ports to 
 be ephemeral, so the test fails if run on a machine where another DN is 
 already running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1360) TestBlockRecovery should bind ephemeral ports

2010-12-14 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1360:


Fix Version/s: 0.23.0

 TestBlockRecovery should bind ephemeral ports
 -

 Key: HDFS-1360
 URL: https://issues.apache.org/jira/browse/HDFS-1360
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: 0.23.0

 Attachments: hdfs-1360.txt


 TestBlockRecovery starts up a DN, but doesn't configure the various ports to 
 be ephemeral, so the test fails if run on a machine where another DN is 
 already running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1537) Add a metrics for tracking the number of reported corrupt replicas

2010-12-14 Thread Hairong Kuang (JIRA)
Add a metrics for tracking the number of reported corrupt replicas
--

 Key: HDFS-1537
 URL: https://issues.apache.org/jira/browse/HDFS-1537
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0


We have a cluster, some of its datanodes' disks are corrupt. But it tooks us a 
few days to be aware of the problem. Adding a metrics that keeps track of the 
number of reported corrupt replicas would allow us to have an alert when 
unusual number of corrupt replicas are reported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially

2010-12-14 Thread Erik Steffl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Steffl updated HDFS-1448:
--

Attachment: HDFS-1448-0.22-4.patch

 Create multi-format parser for edits logs file, support binary and XML 
 formats initially
 

 Key: HDFS-1448
 URL: https://issues.apache.org/jira/browse/HDFS-1448
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.22.0
Reporter: Erik Steffl
Assignee: Erik Steffl
 Fix For: 0.22.0

 Attachments: editsStored, HDFS-1448-0.22-1.patch, 
 HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, 
 HDFS-1448-0.22.patch, Viewer hierarchy.pdf


 Create multi-format parser for edits logs file, support binary and XML 
 formats initially.
 Parsing should work from any supported format to any other supported format 
 (e.g. from binary to XML and from XML to binary).
 The binary format is the format used by FSEditLog class to read/write edits 
 file.
 Primary reason to develop this tool is to help with troubleshooting, the 
 binary format is hard to read and edit (for human troubleshooters).
 Longer term it could be used to clean up and minimize parsers for fsimage and 
 edits files. Edits parser OfflineEditsViewer is written in a very similar 
 fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer 
 and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This 
 is subject to change, specifically depending on adoption of avro (which would 
 completely change how objects are serialized as well as provide ways to 
 convert files to different formats).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially

2010-12-14 Thread Erik Steffl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971486#action_12971486
 ] 

Erik Steffl commented on HDFS-1448:
---

HDFS-1448-0.22-4.patch address the points in review from 09/Dec/10 08:06 PM 
https://issues.apache.org/jira/browse/HDFS-1448?focusedCommentId=12970037page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12970037

 Create multi-format parser for edits logs file, support binary and XML 
 formats initially
 

 Key: HDFS-1448
 URL: https://issues.apache.org/jira/browse/HDFS-1448
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.22.0
Reporter: Erik Steffl
Assignee: Erik Steffl
 Fix For: 0.22.0

 Attachments: editsStored, HDFS-1448-0.22-1.patch, 
 HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, 
 HDFS-1448-0.22.patch, Viewer hierarchy.pdf


 Create multi-format parser for edits logs file, support binary and XML 
 formats initially.
 Parsing should work from any supported format to any other supported format 
 (e.g. from binary to XML and from XML to binary).
 The binary format is the format used by FSEditLog class to read/write edits 
 file.
 Primary reason to develop this tool is to help with troubleshooting, the 
 binary format is hard to read and edit (for human troubleshooters).
 Longer term it could be used to clean up and minimize parsers for fsimage and 
 edits files. Edits parser OfflineEditsViewer is written in a very similar 
 fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer 
 and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This 
 is subject to change, specifically depending on adoption of avro (which would 
 completely change how objects are serialized as well as provide ways to 
 convert files to different formats).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1477) Make NameNode Reconfigurable.

2010-12-14 Thread Patrick Kling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Kling updated HDFS-1477:


Attachment: HDFS-1477.2.patch

Updated patch, added unit test.

ant test-patch output:
{code}
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
{code}

ant test failures (all these tests also fail on a clean trunk for me):
{code}
[junit] Test org.apache.hadoop.hdfs.TestHDFSServerPorts FAILED
[junit] Test org.apache.hadoop.hdfs.TestHDFSTrash FAILED (timeout)
[junit] Test org.apache.hadoop.hdfs.server.namenode.TestBackupNode FAILED
[junit] Test org.apache.hadoop.hdfs.server.namenode.TestStorageRestore 
FAILED
[junit] Test org.apache.hadoop.hdfs.TestFileConcurrentReader FAILED 
(timeout)
{code}

 Make NameNode Reconfigurable.
 -

 Key: HDFS-1477
 URL: https://issues.apache.org/jira/browse/HDFS-1477
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Patrick Kling
 Fix For: 0.23.0

 Attachments: HDFS-1477.2.patch, HDFS-1477.patch


 Modify NameNode to implement the interface Reconfigurable proposed in 
 HADOOP-7001. This would allow us to change certain configuration properties 
 without restarting the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1477) Make NameNode Reconfigurable.

2010-12-14 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971535#action_12971535
 ] 

Jakob Homan commented on HDFS-1477:
---

The addInternalServlet call is used for machine-to-machine servlets that users 
won't use and are not authenticated (or are authenticated over kerberos).  This 
servlet should use the standard add call so that in a secure system the user 
will be authenticated.  Haven't reviewed the rest of the patch; just wanted to 
throw that out.

 Make NameNode Reconfigurable.
 -

 Key: HDFS-1477
 URL: https://issues.apache.org/jira/browse/HDFS-1477
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Patrick Kling
Assignee: Patrick Kling
 Fix For: 0.23.0

 Attachments: HDFS-1477.2.patch, HDFS-1477.patch


 Modify NameNode to implement the interface Reconfigurable proposed in 
 HADOOP-7001. This would allow us to change certain configuration properties 
 without restarting the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-733) TestBlockReport fails intermittently

2010-12-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971559#action_12971559
 ] 

Todd Lipcon commented on HDFS-733:
--

Still happening occasionally in trunk, same error as Eli posted above.

 TestBlockReport fails intermittently
 

 Key: HDFS-733
 URL: https://issues.apache.org/jira/browse/HDFS-733
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Suresh Srinivas
Assignee: Konstantin Boudnik
 Fix For: 0.22.0

 Attachments: HDFS-733.2.patch, HDFS-733.patch, HDFS-733.patch, 
 HDFS-733.patch, HDFS-733.patch


 Details at 
 http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/58/testReport/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1521) Persist transaction ID on disk between NN restarts

2010-12-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971560#action_12971560
 ] 

Todd Lipcon commented on HDFS-1521:
---

Current patch that I uploaded yesterday passes test-patch and unit tests, 
though there are one or two pretty trivial TODOs left in the patch, so I need 
to upload a final one with those addressed. Before I do so, any review comments?

 Persist transaction ID on disk between NN restarts
 --

 Key: HDFS-1521
 URL: https://issues.apache.org/jira/browse/HDFS-1521
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: hdfs-1521.3.txt, hdfs-1521.txt, hdfs-1521.txt


 For HDFS-1073 and other future work, we'd like to have the concept of a 
 transaction ID that is persisted on disk with the image/edits. We already 
 have this concept in the NameNode but it resets to 0 on restart. We can also 
 use this txid to replace the _checkpointTime_ field, I believe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-733) TestBlockReport fails intermittently

2010-12-14 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971563#action_12971563
 ] 

Konstantin Boudnik commented on HDFS-733:
-

it sure does. Seems like on some occasions the replica either needs more time 
to become TEMP. or this happens too fast before checking thread even kicks 
in... Damn nasty problem.

 TestBlockReport fails intermittently
 

 Key: HDFS-733
 URL: https://issues.apache.org/jira/browse/HDFS-733
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Suresh Srinivas
Assignee: Konstantin Boudnik
 Fix For: 0.22.0

 Attachments: HDFS-733.2.patch, HDFS-733.patch, HDFS-733.patch, 
 HDFS-733.patch, HDFS-733.patch


 Details at 
 http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/58/testReport/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.