date:20111218


[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171853#comment-13171853
 ] 

Hudson commented on HDFS-2553:
--

Integrated in Hadoop-Hdfs-trunk #898 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/898/])
HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed 
by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220317
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java


 BlockPoolSliceScanner spinning in loop
 --

 Key: HDFS-2553
 URL: https://issues.apache.org/jira/browse/HDFS-2553
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Todd Lipcon
Assignee: Uma Maheswara Rao G
Priority: Critical
 Fix For: 0.24.0, 0.23.1

 Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch


 Playing with trunk, I managed to get a DataNode in a situation where the 
 BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk


[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171850#comment-13171850
 ] 

Hudson commented on HDFS-2700:
--

Integrated in Hadoop-Hdfs-trunk #898 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/898/])
HDFS-2700. Fix failing TestDataNodeMultipleRegistrations in trunk. 
Contributed by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220315
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java


 TestDataNodeMultipleRegistrations is failing in trunk
 -

 Key: HDFS-2700
 URL: https://issues.apache.org/jira/browse/HDFS-2700
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2700.patch


 TestDataNodeMultipleRegistrations  is failing from last couple of builds
 https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop


[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171856#comment-13171856
 ] 

Hudson commented on HDFS-2553:
--

Integrated in Hadoop-Hdfs-0.23-Build #111 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/111/])
HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed 
by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220316
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java


 BlockPoolSliceScanner spinning in loop
 --

 Key: HDFS-2553
 URL: https://issues.apache.org/jira/browse/HDFS-2553
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Todd Lipcon
Assignee: Uma Maheswara Rao G
Priority: Critical
 Fix For: 0.24.0, 0.23.1

 Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch


 Playing with trunk, I managed to get a DataNode in a situation where the 
 BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop


[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171863#comment-13171863
 ] 

Hudson commented on HDFS-2553:
--

Integrated in Hadoop-Mapreduce-0.23-Build #131 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/131/])
HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed 
by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220316
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java


 BlockPoolSliceScanner spinning in loop
 --

 Key: HDFS-2553
 URL: https://issues.apache.org/jira/browse/HDFS-2553
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Todd Lipcon
Assignee: Uma Maheswara Rao G
Priority: Critical
 Fix For: 0.24.0, 0.23.1

 Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch


 Playing with trunk, I managed to get a DataNode in a situation where the 
 BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1526) Dfs client name for a map/reduce task should have some randomness

2011-12-18 Thread Harsh J (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171865#comment-13171865
]

Harsh J commented on HDFS-1526:
---

Noticed this today while poking around with 0.23.0:

If its not mapreduce, labelling it as 'NONMAPREDUCE' only makes it harder to
grep, cause there's still some 'MAPREDUCE' in it? Its a nitpick (cause IDs
don't carry that string), but perhaps you may consider switching to something
more 'REGULAR'?

Dfs client name for a map/reduce task should have some randomness
-

Key: HDFS-1526
URL: https://issues.apache.org/jira/browse/HDFS-1526
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs client
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Fix For: 0.23.0

Attachments: clientName.patch, randClientId1.patch,
randClientId2.patch, randClientId3.patch

Fsck shows one of the files in our dfs cluster is corrupt.
/bin/hadoop fsck aFile -files -blocks -locations
aFile: 4633 bytes, 2 block(s):
aFile: CORRUPT block blk_-4597378336099313975
OK
0. blk_-4597378336099313975_2284630101 len=0 repl=3 [...]
1. blk_5024052590403223424_2284630107 len=4633 repl=3 [...]Status: CORRUPT
On disk, these two blocks are of the same size and the same content. It turns
out the writer of the file is from a multiple threaded map task. Each thread
may write to the same file. One possible interaction of two threads might
make this to happen:
[T1: create aFile] [T2: delete aFile] [T2: create aFile][T1: addBlock 0 to
aFile][T2: addBlock1 to aFile]...
Because T1 and T2 have the same client name, which is the map task id, the
above interactions could be done without any lease exception, thus eventually
leading to a corrupt file. To solve the problem, a mapreduce task's client
name could be formed by its task id followed by a random number.

[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk


[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171867#comment-13171867
 ] 

Hudson commented on HDFS-2700:
--

Integrated in Hadoop-Mapreduce-trunk #931 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/931/])
HDFS-2700. Fix failing TestDataNodeMultipleRegistrations in trunk. 
Contributed by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220315
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java


 TestDataNodeMultipleRegistrations is failing in trunk
 -

 Key: HDFS-2700
 URL: https://issues.apache.org/jira/browse/HDFS-2700
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2700.patch


 TestDataNodeMultipleRegistrations  is failing from last couple of builds
 https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-18 Thread Uma Maheswara Rao G (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171870#comment-13171870
 ] 

Hudson commented on HDFS-2553:
--

Integrated in Hadoop-Mapreduce-trunk #931 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/931/])
HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed 
by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220317
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java


 BlockPoolSliceScanner spinning in loop
 --

 Key: HDFS-2553
 URL: https://issues.apache.org/jira/browse/HDFS-2553
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Todd Lipcon
Assignee: Uma Maheswara Rao G
Priority: Critical
 Fix For: 0.24.0, 0.23.1

 Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch


 Playing with trunk, I managed to get a DataNode in a situation where the 
 BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-2007) Backupnode downloading image/edits from Namenode at every checkpoint ..

2011-12-18 Thread SreeHari (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SreeHari reassigned HDFS-2007:
--

Assignee: SreeHari

 Backupnode downloading image/edits from Namenode at every checkpoint .. 
 

 Key: HDFS-2007
 URL: https://issues.apache.org/jira/browse/HDFS-2007
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: SreeHari
Assignee: SreeHari
Priority: Minor

 After the fix for HDFS-903 ( md5 verification of fsimage ) , Backupnode is 
 downloading the image  edit files from namenode everytime since the 
 difference in checkpoint time is always maintined b/w Namenode and  
 Backupnode . 
 This happens since Namenode is resetting its checkpoint time after every 
 checkpoint since we are ignoring renewCheckpointTime and passing true 
 explicitly to rollFsimage during endcheckpoint , while Backupnode is setting 
 its checkpointtime to whatever it got from the namenode during 
 startcheckpoint()
 Thus , checkpointtimes will be different during the next checkpoint and ll 
 cause the image to be downloaded again .. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2698) BackupNode is downloading image from NameNode for every checkpoint

2011-12-18 Thread SreeHari (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171873#comment-13171873
 ] 

SreeHari commented on HDFS-2698:


Isnt this same as [https://issues.apache.org/jira/browse/HDFS-2007] ?

 BackupNode is downloading image from NameNode for every checkpoint
 --

 Key: HDFS-2698
 URL: https://issues.apache.org/jira/browse/HDFS-2698
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: rollFSImage.patch, rollFSImage.patch


 BackupNode can make periodic checkpoints without downloading image and edits 
 files from the NameNode, but with just saving the namespace to local disks. 
 This is not happening because NN renews checkpoint time after every 
 checkpoint, thus making its image ahead of the BN's even though they are in 
 sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2668) Incorrect assertion in BlockManager when block report arrives shortly after invalidation decision

2011-12-18 Thread Uma Maheswara Rao G (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2668:
--

Attachment: TestToReproduceHDFS-2668.patch

 Incorrect assertion in BlockManager when block report arrives shortly after 
 invalidation decision
 -

 Key: HDFS-2668
 URL: https://issues.apache.org/jira/browse/HDFS-2668
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Todd Lipcon
 Attachments: TestToReproduceHDFS-2668.patch


 I haven't written a test case to verify this yet, but I believe the following 
 assertion is incorrect:
 {code}
  // Ignore replicas already scheduled to be removed from the DN
  if(invalidateBlocks.contains(dn.getStorageID(), block)) {
assert storedBlock.findDatanode(dn)  0 : Block  + block
  +  in recentInvalidatesSet should not appear in DN  + dn;
 {code}
 The problem is that, when a block is invalidated due to over-replication, it 
 is not immediately removed from the block map. So, if a block report arrives 
 just after a block has been marked as invalidated, but before the block is 
 actually deleted, I think this assertion will trigger incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2668) Incorrect assertion in BlockManager when block report arrives shortly after invalidation decision

2011-12-18 Thread Uma Maheswara Rao G (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2668:
--

Status: Patch Available  (was: Open)

Attached the Test patch, which should reproduce the issue.
I will remove the wrong assertion in BlockManager with next patch.

 Incorrect assertion in BlockManager when block report arrives shortly after 
 invalidation decision
 -

 Key: HDFS-2668
 URL: https://issues.apache.org/jira/browse/HDFS-2668
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Todd Lipcon
 Attachments: TestToReproduceHDFS-2668.patch


 I haven't written a test case to verify this yet, but I believe the following 
 assertion is incorrect:
 {code}
  // Ignore replicas already scheduled to be removed from the DN
  if(invalidateBlocks.contains(dn.getStorageID(), block)) {
assert storedBlock.findDatanode(dn)  0 : Block  + block
  +  in recentInvalidatesSet should not appear in DN  + dn;
 {code}
 The problem is that, when a block is invalidated due to over-replication, it 
 is not immediately removed from the block map. So, if a block report arrives 
 just after a block has been marked as invalidated, but before the block is 
 actually deleted, I think this assertion will trigger incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2668) Incorrect assertion in BlockManager when block report arrives shortly after invalidation decision

2011-12-18 Thread Hadoop QA (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171904#comment-13171904
]

Hadoop QA commented on HDFS-2668:
-

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12507837/TestToReproduceHDFS-2668.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 4 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated 90 warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hdfs.security.token.block.TestBlockToken
org.apache.hadoop.hdfs.TestFileAppend2
org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs
org.apache.hadoop.hdfs.security.TestDelegationToken
org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/1725//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1725//console

This message is automatically generated.

Incorrect assertion in BlockManager when block report arrives shortly after
invalidation decision
-

Key: HDFS-2668
URL: https://issues.apache.org/jira/browse/HDFS-2668
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Attachments: TestToReproduceHDFS-2668.patch

I haven't written a test case to verify this yet, but I believe the following
assertion is incorrect:
{code}
// Ignore replicas already scheduled to be removed from the DN
if(invalidateBlocks.contains(dn.getStorageID(), block)) {
assert storedBlock.findDatanode(dn) 0 : Block + block
+ in recentInvalidatesSet should not appear in DN + dn;
{code}
The problem is that, when a block is invalidated due to over-replication, it
is not immediately removed from the block map. So, if a block report arrives
just after a block has been marked as invalidated, but before the block is
actually deleted, I think this assertion will trigger incorrectly.

[jira] [Commented] (HDFS-2658) HttpFS introduced 70 javadoc warnings

2011-12-18 Thread Eli Collins (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171907#comment-13171907
 ] 

Eli Collins commented on HDFS-2658:
---

+1  thanks Tucu

 HttpFS introduced 70 javadoc warnings
 -

 Key: HDFS-2658
 URL: https://issues.apache.org/jira/browse/HDFS-2658
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0, 0.23.1
Reporter: Eli Collins
Assignee: Alejandro Abdelnur
 Fix For: 0.24.0, 0.23.1

 Attachments: HDFS-2658.patch


 {noformat}
 hadoop1 (trunk)$ grep warning javadoc.txt |grep -c httpfs
 70
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2668) Incorrect assertion in BlockManager when block report arrives shortly after invalidation decision


[ 
https://issues.apache.org/jira/browse/HDFS-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171910#comment-13171910
 ] 

Uma Maheswara Rao G commented on HDFS-2668:
---

I make sure the issue: i just added throws RuntimeException in 
processReportedBlock

in BlockManager#processReportedBlock

{code}
// Ignore replicas already scheduled to be removed from the DN
if(invalidateBlocks.contains(dn.getStorageID(), block)) {
  assert storedBlock.findDatanode(dn)  0 : Block  + block
+  in recentInvalidatesSet should not appear in DN  + dn;
  
  if(storedBlock.findDatanode(dn) = 0)
throw new RuntimeException(Block already added into invalidateBlocks. 
But still this block associated with DN storedBlock.findDatanode(dn) = + 
storedBlock.findDatanode(dn));
  return storedBlock;
}
{code}

After this i ran the above attached Test.
Below are the logs that proves the issue.

2011-12-18 23:02:42,066 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(220)) - ugi=uma (auth:SIMPLE) ip=/127.0.0.1   
cmd=opensrc=/tmp/testBadBlockReportOnTransfer/file1 dst=null
perm=null
All blocks of file /tmp/testBadBlockReportOnTransfer/file1 verified to have 
replication factor 3
2011-12-18 23:02:42,073 INFO  blockmanagement.BlockManager 
(BlockManager.java:setReplication(1814)) - Decreasing replication from 3 to 1 
for /tmp/testBadBlockReportOnTransfer/file1
2011-12-18 23:02:42,073 INFO  hdfs.StateChange (InvalidateBlocks.java:add(77)) 
- BLOCK* InvalidateBlocks: add blk_5137102758256792519_1001 to 127.0.0.1:54432
2011-12-18 23:02:42,073 INFO  hdfs.StateChange 
(BlockManager.java:chooseExcessReplicates(1954)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:54432, blk_5137102758256792519_1001) is 
added to recentInvalidateSets
2011-12-18 23:02:42,073 INFO  hdfs.StateChange (InvalidateBlocks.java:add(77)) 
- BLOCK* InvalidateBlocks: add blk_5137102758256792519_1001 to 127.0.0.1:54418
2011-12-18 23:02:42,073 INFO  hdfs.StateChange 
(BlockManager.java:chooseExcessReplicates(1954)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:54418, blk_5137102758256792519_1001) is 
added to recentInvalidateSets
2011-12-18 23:02:42,076 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(220)) - ugi=uma (auth:SIMPLE) ip=/127.0.0.1   
cmd=setReplication  src=/tmp/testBadBlockReportOnTransfer/file1 
dst=nullperm=null

..
...
2011-12-18 23:02:43,343 WARN  datanode.DataNode 
(BPOfferService.java:offerService(537)) - RemoteException in offerService
java.lang.RuntimeException: java.lang.RuntimeException: Block already added 
into invalidateBlocks. But still this block associated with DN 
storedBlock.findDatanode(dn) =1
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1498)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1418)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1328)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1303)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:847)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:130)
at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:16189)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:834)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1605)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1)
at java.security.AccessController.doPrivileged(Native Method)


 Incorrect assertion in BlockManager when block report arrives shortly after 
 invalidation decision
 -

 Key: HDFS-2668
 URL: https://issues.apache.org/jira/browse/HDFS-2668
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Todd Lipcon
 Attachments: TestToReproduceHDFS-2668.patch


 I haven't written a test case to verify this yet, but I believe the following 
 assertion is incorrect:
 {code}
  // Ignore replicas already scheduled to be removed from the DN
  if(invalidateBlocks.contains(dn.getStorageID(), block)) {
assert storedBlock.findDatanode(dn)  0 : Block  + block
  +  in recentInvalidatesSet should not appear in DN  + dn;
 {code}
 The problem is that,

[jira] [Commented] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy

2011-12-18 Thread Uma Maheswara Rao G (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171911#comment-13171911
 ] 

Uma Maheswara Rao G commented on HDFS-2335:
---

Test failures are unrelated to this patch!

 DataNodeCluster and NNStorage always pull fresh entropy
 ---

 Key: HDFS-2335
 URL: https://issues.apache.org/jira/browse/HDFS-2335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.24.0, 1.0.0
Reporter: Eli Collins
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-2335.patch, HDFS-2335.patch


 Jira for giving DataNodeCluster and NNStorage the same treatment as 
 HDFS-1835. They're not truly cryptographic uses as well. We should also 
 factor this out to a utility method, seems like the three uses are slightly 
 different, eg one uses DFSUtil.getRandom and the other creates a new Random 
 object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2668) Incorrect assertion in BlockManager when block report arrives shortly after invalidation decision

2011-12-18 Thread Uma Maheswara Rao G (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2668:
--

Status: Open  (was: Patch Available)

 Incorrect assertion in BlockManager when block report arrives shortly after 
 invalidation decision
 -

 Key: HDFS-2668
 URL: https://issues.apache.org/jira/browse/HDFS-2668
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Todd Lipcon
 Attachments: TestToReproduceHDFS-2668.patch


 I haven't written a test case to verify this yet, but I believe the following 
 assertion is incorrect:
 {code}
  // Ignore replicas already scheduled to be removed from the DN
  if(invalidateBlocks.contains(dn.getStorageID(), block)) {
assert storedBlock.findDatanode(dn)  0 : Block  + block
  +  in recentInvalidatesSet should not appear in DN  + dn;
 {code}
 The problem is that, when a block is invalidated due to over-replication, it 
 is not immediately removed from the block map. So, if a block report arrives 
 just after a block has been marked as invalidated, but before the block is 
 actually deleted, I think this assertion will trigger incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2701) Cleanup FS* processIOError methods

[
https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Collins updated HDFS-2701:
--

Attachment: hdfs-2701.txt

Thanks for the review Todd. Updated patch attach.

#1 Agree, I've done this in HDFS-2702, I was trying to keep this change to just
cleanup/refactoring (the current crazy behavior is actually what causes
HDFS-2702!).
#2 Good catch. Fixed.
#3-5 Done.

Wrt testing see my comment in HDFS-2702. The short answer is that aside from
the existing tests which are clean I've done manual testing (failing storage
dirs and checkpointing) for 2701-2703 and am working on a unit test that will
cover storage dir failures and removal.

Cleanup FS* processIOError methods
--

Key: HDFS-2701
URL: https://issues.apache.org/jira/browse/HDFS-2701
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
Attachments: hdfs-2701.txt, hdfs-2701.txt, hdfs-2701.txt,
hdfs-2701.txt

Let's rename the various processIOError methods to be more descriptive. The
current code makes it difficult to identify and reason about bug fixes. While
we're at it let's remove Fatal from the Unable to sync the edit log log
since it's not actually a fatal error (this is confusing to users). And 2NN
Checkpoint done should be info, not a warning (also confusing to users).
Thanks to HDFS-1073 these issues don't exist on trunk or 23.

[jira] [Commented] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy

2011-12-18 Thread Eli Collins (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171915#comment-13171915
 ] 

Eli Collins commented on HDFS-2335:
---

+1

 DataNodeCluster and NNStorage always pull fresh entropy
 ---

 Key: HDFS-2335
 URL: https://issues.apache.org/jira/browse/HDFS-2335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.24.0, 1.0.0
Reporter: Eli Collins
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-2335.patch, HDFS-2335.patch


 Jira for giving DataNodeCluster and NNStorage the same treatment as 
 HDFS-1835. They're not truly cryptographic uses as well. We should also 
 factor this out to a utility method, seems like the three uses are slightly 
 different, eg one uses DFSUtil.getRandom and the other creates a new Random 
 object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2657) TestHttpFSServer and TestServerWebApp are failing on trunk

2011-12-18 Thread Eli Collins (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171917#comment-13171917
 ] 

Eli Collins commented on HDFS-2657:
---

Yea, every trunk build for the last couple of days has failed these.

TestHttpFSServer fails the same assert and TestServerWebApp gets the following 
NPE.

{noformat}
org.apache.hadoop.lib.servlet.TestServerWebApp.lifecycle

Failing for the past 5 builds (Since #894 )
Took 11 ms.
add description
Stacktrace

java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:418)
at java.util.Properties.load0(Properties.java:337)
at java.util.Properties.load(Properties.java:325)
at org.apache.hadoop.lib.server.Server.init(Server.java:348)
at 
org.apache.hadoop.lib.servlet.ServerWebApp.contextInitialized(ServerWebApp.java:142)
at 
org.apache.hadoop.lib.servlet.TestServerWebApp.__CLR3_0_2sd9si72uk(TestServerWebApp.java:56)
at 
org.apache.hadoop.lib.servlet.TestServerWebApp.lifecycle(TestServerWebApp.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:108)
at 
org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:51)
at 
org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:41)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at 
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at 
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:81)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
Standard Output

 test.properties : NONE

 test.dir: 
 /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/testdir
{noformat}

 TestHttpFSServer and TestServerWebApp are failing on trunk
 --

 Key: HDFS-2657
 URL: https://issues.apache.org/jira/browse/HDFS-2657
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Alejandro Abdelnur

  org.apache.hadoop.fs.http.server.TestHttpFSServer.instrumentation
  org.apache.hadoop.lib.servlet.TestServerWebApp.lifecycle

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy


 [ 
https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2335:
--

  Resolution: Fixed
Target Version/s: 0.23.1  (was: 0.24.0)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I've committed this and merged to 23. Thanks Uma!

 DataNodeCluster and NNStorage always pull fresh entropy
 ---

 Key: HDFS-2335
 URL: https://issues.apache.org/jira/browse/HDFS-2335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.24.0, 1.0.0
Reporter: Eli Collins
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-2335.patch, HDFS-2335.patch


 Jira for giving DataNodeCluster and NNStorage the same treatment as 
 HDFS-1835. They're not truly cryptographic uses as well. We should also 
 factor this out to a utility method, seems like the three uses are slightly 
 different, eg one uses DFSUtil.getRandom and the other creates a new Random 
 object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy


[ 
https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171920#comment-13171920
 ] 

Hudson commented on HDFS-2335:
--

Integrated in Hadoop-Common-trunk-Commit #1451 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1451/])
HDFS-2335. DataNodeCluster and NNStorage always pull fresh entropy. 
Contributed by Uma Maheswara Rao G

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220510
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java


 DataNodeCluster and NNStorage always pull fresh entropy
 ---

 Key: HDFS-2335
 URL: https://issues.apache.org/jira/browse/HDFS-2335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.24.0, 1.0.0
Reporter: Eli Collins
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-2335.patch, HDFS-2335.patch


 Jira for giving DataNodeCluster and NNStorage the same treatment as 
 HDFS-1835. They're not truly cryptographic uses as well. We should also 
 factor this out to a utility method, seems like the three uses are slightly 
 different, eg one uses DFSUtil.getRandom and the other creates a new Random 
 object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy


[ 
https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171924#comment-13171924
 ] 

Hudson commented on HDFS-2335:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1524 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1524/])
HDFS-2335. DataNodeCluster and NNStorage always pull fresh entropy. 
Contributed by Uma Maheswara Rao G

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220510
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java


 DataNodeCluster and NNStorage always pull fresh entropy
 ---

 Key: HDFS-2335
 URL: https://issues.apache.org/jira/browse/HDFS-2335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.24.0, 1.0.0
Reporter: Eli Collins
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-2335.patch, HDFS-2335.patch


 Jira for giving DataNodeCluster and NNStorage the same treatment as 
 HDFS-1835. They're not truly cryptographic uses as well. We should also 
 factor this out to a utility method, seems like the three uses are slightly 
 different, eg one uses DFSUtil.getRandom and the other creates a new Random 
 object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy


[ 
https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171925#comment-13171925
 ] 

Hudson commented on HDFS-2335:
--

Integrated in Hadoop-Hdfs-0.23-Commit #294 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/294/])
HDFS-2335. svn merge -c 1220510 from trunk

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220513
Files : 
* /hadoop/common/branches/branch-0.23
* /hadoop/common/branches/branch-0.23/hadoop-common-project
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/.gitignore
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/io/FileBench.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/io/TestSequenceFileMergeProgress.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/security/authorize/TestServiceLevelAuthorization.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/test/MapredTestDriver.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job
*

[jira] [Commented] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy


[ 
https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171926#comment-13171926
 ] 

Hudson commented on HDFS-2335:
--

Integrated in Hadoop-Common-0.23-Commit #305 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/305/])
HDFS-2335. svn merge -c 1220510 from trunk

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220513
Files : 
* /hadoop/common/branches/branch-0.23
* /hadoop/common/branches/branch-0.23/hadoop-common-project
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/.gitignore
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/io/FileBench.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/io/TestSequenceFileMergeProgress.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/security/authorize/TestServiceLevelAuthorization.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/test/MapredTestDriver.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job

[jira] [Commented] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy


[ 
https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171927#comment-13171927
 ] 

Hudson commented on HDFS-2335:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1474 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1474/])
HDFS-2335. DataNodeCluster and NNStorage always pull fresh entropy. 
Contributed by Uma Maheswara Rao G

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220510
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java


 DataNodeCluster and NNStorage always pull fresh entropy
 ---

 Key: HDFS-2335
 URL: https://issues.apache.org/jira/browse/HDFS-2335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.24.0, 1.0.0
Reporter: Eli Collins
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-2335.patch, HDFS-2335.patch


 Jira for giving DataNodeCluster and NNStorage the same treatment as 
 HDFS-1835. They're not truly cryptographic uses as well. We should also 
 factor this out to a utility method, seems like the three uses are slightly 
 different, eg one uses DFSUtil.getRandom and the other creates a new Random 
 object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy


[ 
https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171929#comment-13171929
 ] 

Hudson commented on HDFS-2335:
--

Integrated in Hadoop-Mapreduce-0.23-Commit #316 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/316/])
HDFS-2335. svn merge -c 1220510 from trunk

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1220513
Files : 
* /hadoop/common/branches/branch-0.23
* /hadoop/common/branches/branch-0.23/hadoop-common-project
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/.gitignore
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/io/FileBench.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/io/TestSequenceFileMergeProgress.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/security/authorize/TestServiceLevelAuthorization.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/test/MapredTestDriver.java
*

[jira] [Updated] (HDFS-2702) A single failed name dir can cause the NN to exit


 [ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2702:
--

Attachment: hdfs-2702.txt

Updated patch with new test class that covers:
#1 The NN doesn't exit as long as it has a valid storage dir
#2 The NN exits when it no longer has a valid storage dir
#3 Removed storage dirs is updated (fails w/o HDFS-2703)

 A single failed name dir can cause the NN to exit 
 --

 Key: HDFS-2702
 URL: https://issues.apache.org/jira/browse/HDFS-2702
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Critical
 Attachments: hdfs-2702.txt, hdfs-2702.txt


 There's a bug in FSEditLog#rollEditLog which results in the NN process 
 exiting if a single name dir has failed. Here's the relevant code:
 {code}
 close()  // So editStreams.size() is 0 
 foreach edits dir {
   ..
   eStream = new ...  // Might get an IOE here
   editStreams.add(eStream);
 } catch (IOException ioe) {
   removeEditsForStorageDir(sd);  // exits if editStreams.size() = 1  
 }
 {code}
 If we get an IOException before we've added two edits streams to the list 
 we'll exit, eg if there's an error processing the 1st name dir we'll exit 
 even if there are 4 valid name dirs. The fix is to move the checking out of 
 removeEditsForStorageDir (nee processIOError) or modify it so it can be 
 disabled in some cases, eg here where we don't yet know how many streams are 
 valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-18 Thread dhruba borthakur (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171951#comment-13171951
]

dhruba borthakur commented on HDFS-2699:

Thanks for your comments Scott, Andrew, Todd and Allen.

Scott: most of our our hbase production clusters have io.bytes.per.checksum to
4096 (instead of 512)

Allen: One can put crcs on a logging device, e.g. bookkeeper perhaps? But at
the end of day, each random io from an hdfs file will consume two disk iops
(one on the hdfs block storage and one from the loogging device), is it not?
Won't it be optimal to inline crc and data.

If we decide to implement inline crc, can we make the hdfs support two
different data formats and not do any automatic data format upgrade for
exisiting data? pre-existing data can remain in the older format while newly
created files will have data in the new -inline-data-and-crc format. What to do
people think about this idea?

Store data and checksums together in block file
---

Key: HDFS-2699
URL: https://issues.apache.org/jira/browse/HDFS-2699
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur

The current implementation of HDFS stores the data in one block file and the
metadata(checksum) in another block file. This means that every read from
HDFS actually consumes two disk iops, one to the datafile and one to the
checksum file. This is a major problem for scaling HBase, because HBase is
usually bottlenecked on the number of random disk iops that the
storage-hardware offers.

[jira] [Updated] (HDFS-2702) A single failed name dir can cause the NN to exit


 [ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2702:
--

Attachment: hdfs-2702.txt

Slightly updated patch. I made FSEditLog#logEdit throw an AssertionError 
(rather than just assert) so we stop the NN if there's a bug where we forget to 
remove an edit stream after we notice a failed directory. This should never 
fire, but could if we introduced a bug where eg we missed a call to 
removeEdits. Updated the test to check that we can't log an edit if there are 
no streams.

 A single failed name dir can cause the NN to exit 
 --

 Key: HDFS-2702
 URL: https://issues.apache.org/jira/browse/HDFS-2702
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Critical
 Attachments: hdfs-2702.txt, hdfs-2702.txt, hdfs-2702.txt


 There's a bug in FSEditLog#rollEditLog which results in the NN process 
 exiting if a single name dir has failed. Here's the relevant code:
 {code}
 close()  // So editStreams.size() is 0 
 foreach edits dir {
   ..
   eStream = new ...  // Might get an IOE here
   editStreams.add(eStream);
 } catch (IOException ioe) {
   removeEditsForStorageDir(sd);  // exits if editStreams.size() = 1  
 }
 {code}
 If we get an IOException before we've added two edits streams to the list 
 we'll exit, eg if there's an error processing the 1st name dir we'll exit 
 even if there are 4 valid name dirs. The fix is to move the checking out of 
 removeEditsForStorageDir (nee processIOError) or modify it so it can be 
 disabled in some cases, eg here where we don't yet know how many streams are 
 valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2662) Namenode should log warning message when trying to start on a unformmatted system

2011-12-18 Thread William McNeill (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171954#comment-13171954
 ] 

William McNeill commented on HDFS-2662:
---

I verified that if I delete the DFS directory (/tmp/hadoop-williammcneill/dfs 
on my machine) and run start-dfs.sh I get just the SCDynamicStore error message 
in the DFS log, but the namenode is not running. If I then run hadoop namenode 
-format and re-run start-dfs.sh the log file is the same--just the 
SCDynamicStore error message--but the namenode is now running and HDFS 
operations work.

I'll double-check the Kerberos workaround discussed in Jira 7489, but I think 
that's an unrelated issue since you see the warning message regardless of 
whether the namenode is running.

 Namenode should log warning message when trying to start on a unformmatted 
 system
 -

 Key: HDFS-2662
 URL: https://issues.apache.org/jira/browse/HDFS-2662
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.20.203.0
 Environment: Single-node cluster on OS X 10.7 (Lion)
Reporter: William McNeill
Priority: Minor
  Labels: format, logging, namenode

 When you try to start the namenode for a system that does not have a 
 formatted DFS, it fails silently without any indication that the lack of 
 formatting was the problem. I tried to run start-dfs.sh on a single-node 
 cluster with an unformatted HDFS. The namenode failed to start, but generated 
 no warning messages, and its log was empty. After running hadoop namenode 
 -format everything worked.
 Details in this thread: 
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201112.mbox/%3CCAN9z%2BopAn-t_f3FRC%3DDtV0n0ysoKd3Fek-fJPb68PMThiPooKg%40mail.gmail.com%3E
 This is a difficult problem to diagnose because the namenode gives you no 
 feedback. It would be better if it printed an error message to its log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2662) Namenode should log warning message when trying to start on a unformmatted system

2011-12-18 Thread William McNeill (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171955#comment-13171955
 ] 

William McNeill commented on HDFS-2662:
---

I have no idea why part of the above comment has strikethrough formatting.

 Namenode should log warning message when trying to start on a unformmatted 
 system
 -

 Key: HDFS-2662
 URL: https://issues.apache.org/jira/browse/HDFS-2662
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.20.203.0
 Environment: Single-node cluster on OS X 10.7 (Lion)
Reporter: William McNeill
Priority: Minor
  Labels: format, logging, namenode

 When you try to start the namenode for a system that does not have a 
 formatted DFS, it fails silently without any indication that the lack of 
 formatting was the problem. I tried to run start-dfs.sh on a single-node 
 cluster with an unformatted HDFS. The namenode failed to start, but generated 
 no warning messages, and its log was empty. After running hadoop namenode 
 -format everything worked.
 Details in this thread: 
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201112.mbox/%3CCAN9z%2BopAn-t_f3FRC%3DDtV0n0ysoKd3Fek-fJPb68PMThiPooKg%40mail.gmail.com%3E
 This is a difficult problem to diagnose because the namenode gives you no 
 feedback. It would be better if it printed an error message to its log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2699) Store data and checksums together in block file


[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171961#comment-13171961
 ] 

Todd Lipcon commented on HDFS-2699:
---

The idea of introducing the new format as a backward-compatible option sounds 
good to me. That's what we did for the CRC32C checksums - new files are written 
with that checksum algorithm but old files continue to operate with the old one.

 Store data and checksums together in block file
 ---

 Key: HDFS-2699
 URL: https://issues.apache.org/jira/browse/HDFS-2699
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read from 
 HDFS actually consumes two disk iops, one to the datafile and one to the 
 checksum file. This is a major problem for scaling HBase, because HBase is 
 usually  bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2662) Namenode should log warning message when trying to start on a unformmatted system

2011-12-18 Thread William McNeill (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171962#comment-13171962
]

William McNeill commented on HDFS-2662:
---

I tried adding the parameters from 7489 to by hdfs-site.xml configuration file
and see the exact same behavior. I still get the SCDynamic store warning, and
the lack of a logged error about DFS directories not existing when they haven't
been properly formatted.

So I can't get the 7489 workaround to work, but I still suspect that's an
unrelated issue.

Namenode should log warning message when trying to start on a unformmatted
system
-

Key: HDFS-2662
URL: https://issues.apache.org/jira/browse/HDFS-2662
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Affects Versions: 0.20.203.0
Environment: Single-node cluster on OS X 10.7 (Lion)
Reporter: William McNeill
Priority: Minor
Labels: format, logging, namenode

When you try to start the namenode for a system that does not have a
formatted DFS, it fails silently without any indication that the lack of
formatting was the problem. I tried to run start-dfs.sh on a single-node
cluster with an unformatted HDFS. The namenode failed to start, but generated
no warning messages, and its log was empty. After running hadoop namenode
-format everything worked.
Details in this thread:
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201112.mbox/%3CCAN9z%2BopAn-t_f3FRC%3DDtV0n0ysoKd3Fek-fJPb68PMThiPooKg%40mail.gmail.com%3E
This is a difficult problem to diagnose because the namenode gives you no
feedback. It would be better if it printed an error message to its log file.

[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-18 Thread M. C. Srivas (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171970#comment-13171970
]

M. C. Srivas commented on HDFS-2699:

Couple of observations:

a. If you want to eventually support random-IO, then a block size of 4096 is
too large for the CRC, as it will cause a read-modify-write cycle on the entire
4K. 512-bytes reduces this overhead.

b. Can the value of the variable io.bytes.per.checksum be transferred from
the *-site.xml file into the file-properties at the NN at the time of file
creation? If someone messes around with it, old files will still work as before

Store data and checksums together in block file
---

Key: HDFS-2699
URL: https://issues.apache.org/jira/browse/HDFS-2699
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur

[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-18 Thread dhruba borthakur (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171976#comment-13171976
]

dhruba borthakur commented on HDFS-2699:

Thanks srivas for your comments.

a block size of 4096 is too large for the CRC

the hbase block size is 16K. The hdfs checksum size is 4K. The hdfs block size
is 256 MB. which one r u referring to here? Can you pl explain the
read-modify-write cycle? HDFS does mostly large sequential writes (no
overwrites).

io.bytes.per.checksum be transferred from the *-site.xml
It is already stored in the datanode meta file associated with each block.
Different hdfs files in the same hdfs cluster can have different
io.bytes.per.checksum

Store data and checksums together in block file
---

Key: HDFS-2699
URL: https://issues.apache.org/jira/browse/HDFS-2699
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur

[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-18 Thread M. C. Srivas (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171997#comment-13171997
]

M. C. Srivas commented on HDFS-2699:

@dhruba:

a block size of 4096 is too large for the CRC

The CRC block size. (that is, the contiguous region of the file that a CRC
covers). Modifying any portion of that region will require that the entire
data for the region be read in, and the CRC recomputed for that entire region
and the entire region written out again.

Note that it also introduces a new failure mode ... data that was previously
written safely a long time ago could be now deemed corrupt since the CRC is
no-longer good due to a minor modification during an append. The failure
scenario is as follows:

1. A thread writes to a file and closes it. Lets say the file length is 9K.
There are 3 CRCs embedded inline -- one for 0-4K, one for 4K-8K, and one for
8K-9K. Call the last one CRC3.

2. An append happens a few days later to extend the file from 9K to 11K. CRC3
is now recomputed for the 3K-sized region spanning offsets 8K-11K and written
out as CRC3-new. But there is a crash, and the entire 3K is not all written out
cleanly (CRC3-new and some data in written out before the crash -- all 3 copies
crash and recover).

3. A subsequent read on the region 8K-9K now fails with a CRC error ... even
though the write was stable and used to succeed before.

If this file was the HBase WAL, wouldn't this result in a data loss?

Store data and checksums together in block file
---

Key: HDFS-2699
URL: https://issues.apache.org/jira/browse/HDFS-2699
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur

[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

[
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172006#comment-13172006
]

Todd Lipcon commented on HDFS-2699:
---

bq. Modifying any portion of that region will require that the entire data for
the region be read in, and the CRC recomputed for that entire region and the
entire region written out again

But the cost of random-reading 4K is essentially the same as the cost of
reading 512 bytes. Once you seek to the offset, the data transfer time is
insignificant.

Plus, given the 4KB page size used by Linux, all IO is already at this
granularity.

bq. An append happens a few days later to extend the file from 9K to 11K. CRC3
is now recomputed for the 3K-sized region spanning offsets 8K-11K and written
out as CRC3-new. But there is a crash...

This is an existing issue regardless of whether the checksums are interleaved
or separate. The current solution is that we allow a checksum error on the last
checksum chunk of a file in the case that it's being recovered after a crash
-- iirc only in the case that _all_ replicas have this issue. If there is any
valid replica, then we use that and truncate/rollback the other files to the
sync boundary.

Store data and checksums together in block file
---

Key: HDFS-2699
URL: https://issues.apache.org/jira/browse/HDFS-2699
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur

[jira] [Commented] (HDFS-2701) Cleanup FS* processIOError methods


[ 
https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172023#comment-13172023
 ] 

Todd Lipcon commented on HDFS-2701:
---

in open(), if all of them fail to open, we'll have no edits streams... is that 
taken care of by 2702?



in removeEditsForStorageDir, I think there might be a bug with the following 
sequence:
- dir holding both edits and image fails
- restoreFailedStorage is called so it is added back to the list for image 
operations, but edit logs haven't rolled yet, so it's not in editStreams
- it fails again, so removeEditsForStorageDir is called with a dir that doesn't 
have any open stream.

In that case, exitIfInvalidStreams() would exit even though nothing is getting 
removed.

I guess this is taken care of by HDFS-2702?



If the answer to both of the above is yes, then +1 :)

 Cleanup FS* processIOError methods
 --

 Key: HDFS-2701
 URL: https://issues.apache.org/jira/browse/HDFS-2701
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-2701.txt, hdfs-2701.txt, hdfs-2701.txt, 
 hdfs-2701.txt


 Let's rename the various processIOError methods to be more descriptive. The 
 current code makes it difficult to identify and reason about bug fixes. While 
 we're at it let's remove Fatal from the Unable to sync the edit log log 
 since it's not actually a fatal error (this is confusing to users). And 2NN 
 Checkpoint done should be info, not a warning (also confusing to users).
 Thanks to HDFS-1073 these issues don't exist on trunk or 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir


[ 
https://issues.apache.org/jira/browse/HDFS-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172024#comment-13172024
 ] 

Todd Lipcon commented on HDFS-2703:
---

+1

 removedStorageDirs is not updated everywhere we remove a storage dir
 

 Key: HDFS-2703
 URL: https://issues.apache.org/jira/browse/HDFS-2703
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-2703.txt


 There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) 
 where we remove a storage directory but don't add it to the 
 removedStorageDirs list. This means a storage dir may have been removed but 
 we don't see it in the log or Web UI. This doesn't affect trunk/23 since the 
 code there is totally different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2702) A single failed name dir can cause the NN to exit


[ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172025#comment-13172025
 ] 

Todd Lipcon commented on HDFS-2702:
---

- in {{fatalExit}}, can you change it to:
{code}
FSNamesystem.LOG.faral(msg, new Exception(msg));
{code}
so that we get a stacktrace in the logs?

- in {{exitIfNoStreams}} use {{isEmpty}} instead of comparing {{size() == 0}}
- rather than an {{if...throw AssertionError}} maybe just use the 
{{Preconditions.checkState}} function from guava? Or is guava not in branch-1 
yet? (can't remember)
- instead of calling {{exitIfNoStreams}} everywhere, maybe 
{{removeEditsForStorageDir}} can just call it whenever it removes one?

Otherwise looks good.

 A single failed name dir can cause the NN to exit 
 --

 Key: HDFS-2702
 URL: https://issues.apache.org/jira/browse/HDFS-2702
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Critical
 Attachments: hdfs-2702.txt, hdfs-2702.txt, hdfs-2702.txt


 There's a bug in FSEditLog#rollEditLog which results in the NN process 
 exiting if a single name dir has failed. Here's the relevant code:
 {code}
 close()  // So editStreams.size() is 0 
 foreach edits dir {
   ..
   eStream = new ...  // Might get an IOE here
   editStreams.add(eStream);
 } catch (IOException ioe) {
   removeEditsForStorageDir(sd);  // exits if editStreams.size() = 1  
 }
 {code}
 If we get an IOException before we've added two edits streams to the list 
 we'll exit, eg if there's an error processing the 1st name dir we'll exit 
 even if there are 4 valid name dirs. The fix is to move the checking out of 
 removeEditsForStorageDir (nee processIOError) or modify it so it can be 
 disabled in some cases, eg here where we don't yet know how many streams are 
 valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2679) Add interface to query current state to HAServiceProtocol


[ 
https://issues.apache.org/jira/browse/HDFS-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172032#comment-13172032
 ] 

Todd Lipcon commented on HDFS-2679:
---

+1. I'll commit this momentarily

 Add interface to query current state to HAServiceProtocol 
 --

 Key: HDFS-2679
 URL: https://issues.apache.org/jira/browse/HDFS-2679
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-2679.txt, hdfs-2679.txt, hdfs-2679.txt, 
 hdfs-2679.txt, hdfs-2679.txt


 Let's add an interface to HAServiceProtocol to query the current state of a 
 NameNode for use by the the CLI (HAAdmin) and Web UI (HDFS-2677). This 
 essentially makes the names active and standby from ACTIVE_STATE and 
 STANDBY_STATE public interfaces, which IMO seems reasonable. Unlike the other 
 APIs we should be able to use the interface even when HA is not enabled (as 
 by default a non-HA NN is active).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2679) Add interface to query current state to HAServiceProtocol

2011-12-18 Thread Todd Lipcon (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2679.
---

   Resolution: Fixed
Fix Version/s: HA branch (HDFS-1623)
 Hadoop Flags: Reviewed

 Add interface to query current state to HAServiceProtocol 
 --

 Key: HDFS-2679
 URL: https://issues.apache.org/jira/browse/HDFS-2679
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: HA branch (HDFS-1623)

 Attachments: hdfs-2679.txt, hdfs-2679.txt, hdfs-2679.txt, 
 hdfs-2679.txt, hdfs-2679.txt


 Let's add an interface to HAServiceProtocol to query the current state of a 
 NameNode for use by the the CLI (HAAdmin) and Web UI (HDFS-2677). This 
 essentially makes the names active and standby from ACTIVE_STATE and 
 STANDBY_STATE public interfaces, which IMO seems reasonable. Unlike the other 
 APIs we should be able to use the interface even when HA is not enabled (as 
 by default a non-HA NN is active).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2677) HA: Web UI should indicate the NN state


[ 
https://issues.apache.org/jira/browse/HDFS-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172034#comment-13172034
 ] 

Todd Lipcon commented on HDFS-2677:
---

+1, will commit momentarily

 HA: Web UI should indicate the NN state
 ---

 Key: HDFS-2677
 URL: https://issues.apache.org/jira/browse/HDFS-2677
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: HA branch (HDFS-1623)

 Attachments: hdfs-2677.txt, hdfs-2677.txt, hdfs-2677.txt, 
 hdfs-2677.txt


 The DFS web UI should indicate whether it's an active or standby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2677) HA: Web UI should indicate the NN state

2011-12-18 Thread Todd Lipcon (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2677.
---

   Resolution: Fixed
Fix Version/s: HA branch (HDFS-1623)
 Hadoop Flags: Reviewed

 HA: Web UI should indicate the NN state
 ---

 Key: HDFS-2677
 URL: https://issues.apache.org/jira/browse/HDFS-2677
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: HA branch (HDFS-1623)

 Attachments: hdfs-2677.txt, hdfs-2677.txt, hdfs-2677.txt, 
 hdfs-2677.txt


 The DFS web UI should indicate whether it's an active or standby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-1108) Log newly allocated blocks

2011-12-18 Thread Todd Lipcon (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-1108.
---

Resolution: Duplicate

Resolving this one as duplicate since it got incorporated into HDFS-2602

 Log newly allocated blocks
 --

 Key: HDFS-1108
 URL: https://issues.apache.org/jira/browse/HDFS-1108
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Reporter: dhruba borthakur
Assignee: Todd Lipcon
 Fix For: HA branch (HDFS-1623)

 Attachments: HDFS-1108.patch, hdfs-1108-habranch.txt, 
 hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
 hdfs-1108-habranch.txt, hdfs-1108.txt


 The current HDFS design says that newly allocated blocks for a file are not 
 persisted in the NN transaction log when the block is allocated. Instead, a 
 hflush() or a close() on the file persists the blocks into the transaction 
 log. It would be nice if we can immediately persist newly allocated blocks 
 (as soon as they are allocated) for specific files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2291) HA: Checkpointing in an HA setup

2011-12-18 Thread Todd Lipcon (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2291:
--

Component/s: ha

 HA: Checkpointing in an HA setup
 

 Key: HDFS-2291
 URL: https://issues.apache.org/jira/browse/HDFS-2291
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Aaron T. Myers
Assignee: Todd Lipcon
 Fix For: HA branch (HDFS-1623)


 We obviously need to create checkpoints when HA is enabled. One thought is to 
 use a third, dedicated checkpointing node in addition to the active and 
 standby nodes. Another option would be to make the standby capable of also 
 performing the function of checkpointing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2394) Add tests for Namenode active standby states