[jira] [Updated] (HDFS-554) BlockInfo.ensureCapacity may get a speedup from System.arraycopy()

2011-12-17 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-554:
-

Status: Open  (was: Patch Available)

> BlockInfo.ensureCapacity may get a speedup from System.arraycopy()
> --
>
> Key: HDFS-554
> URL: https://issues.apache.org/jira/browse/HDFS-554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Steve Loughran
>Assignee: Harsh J
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-554.patch, HDFS-554.txt
>
>
> BlockInfo.ensureCapacity() uses a for() loop to copy the old array data into 
> the expanded array.  {{System.arraycopy()}} is generally much faster for 
> this, as it can do a bulk memory copy. There is also the typesafe Java6 
> {{Arrays.copyOf()}} to consider, though here it offers no tangible benefit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-554) BlockInfo.ensureCapacity may get a speedup from System.arraycopy()

2011-12-17 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-554:
-

Status: Patch Available  (was: Open)

> BlockInfo.ensureCapacity may get a speedup from System.arraycopy()
> --
>
> Key: HDFS-554
> URL: https://issues.apache.org/jira/browse/HDFS-554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Steve Loughran
>Assignee: Harsh J
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-554.patch, HDFS-554.txt
>
>
> BlockInfo.ensureCapacity() uses a for() loop to copy the old array data into 
> the expanded array.  {{System.arraycopy()}} is generally much faster for 
> this, as it can do a bulk memory copy. There is also the typesafe Java6 
> {{Arrays.copyOf()}} to consider, though here it offers no tangible benefit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171758#comment-13171758
 ] 

Hudson commented on HDFS-2553:
--

Integrated in Hadoop-Mapreduce-0.23-Commit #315 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/315/])
HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed 
by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220316
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java


> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 0.24.0, 0.23.1
>
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171752#comment-13171752
 ] 

Hudson commented on HDFS-2700:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1473 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1473/])
HDFS-2700. Fix failing TestDataNodeMultipleRegistrations in trunk. 
Contributed by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220315
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java


> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 0.24.0
>
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171753#comment-13171753
 ] 

Hudson commented on HDFS-2553:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1473 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1473/])
HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed 
by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220317
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java


> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 0.24.0, 0.23.1
>
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir

2011-12-17 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171748#comment-13171748
 ] 

Todd Lipcon commented on HDFS-2703:
---

Looks good, but see the comment about a test plan in my comment on HDFS-2701

> removedStorageDirs is not updated everywhere we remove a storage dir
> 
>
> Key: HDFS-2703
> URL: https://issues.apache.org/jira/browse/HDFS-2703
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2703.txt
>
>
> There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) 
> where we remove a storage directory but don't add it to the 
> removedStorageDirs list. This means a storage dir may have been removed but 
> we don't see it in the log or Web UI. This doesn't affect trunk/23 since the 
> code there is totally different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2701) Cleanup FS* processIOError methods

2011-12-17 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171747#comment-13171747
 ] 

Todd Lipcon commented on HDFS-2701:
---

- The behavior of exitIfInvalidStreams is extremely counter-intuitive... why 
don't you change it to check for an empty list, and just change the call site 
to call _after_ the errored dir is removed?



In removeEditsAndStorageDir:
{code}
+editStreams.remove(idx);
+fsimage.removeStorageDir(getStorageDirForStream(idx));
{code}
I don't think this is correct -- because getStorageDirForStream is called after 
it's removed from editStreams, it will remove the one that came _after_ the 
stream in the storage dir list (or throw an ArrayIndexOutOfBounds if it was the 
last stream)


In {{removeEditsStreamsAndStorageDirs}}, you can use a foreach loop instead of 
indexed iteration:
{code}
+for (int i = 0; i < errorStreams.size(); i++) {
+  int idx = editStreams.indexOf(errorStreams.get(i));
{code}


{code}
+FSNamesystem.LOG.error("Unable to sync edit log");
{code}
we should probably include the path of the failed stream here


{code}
+  throw new IOException(
+  "Inconsistent existance of edits.new " + editsNew);
{code}
spelling error - should be "existence"



What's the test plan for this, HDFS-2702, and HDFS-2703? I agree its buggy but 
we should articulate a way to make sure we fixed all the issues and didn't 
introduce new ones.


> Cleanup FS* processIOError methods
> --
>
> Key: HDFS-2701
> URL: https://issues.apache.org/jira/browse/HDFS-2701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2701.txt, hdfs-2701.txt, hdfs-2701.txt
>
>
> Let's rename the various "processIOError" methods to be more descriptive. The 
> current code makes it difficult to identify and reason about bug fixes. While 
> we're at it let's remove "Fatal" from the "Unable to sync the edit log" log 
> since it's not actually a fatal error (this is confusing to users). And 2NN 
> "Checkpoint done" should be info, not a warning (also confusing to users).
> Thanks to HDFS-1073 these issues don't exist on trunk or 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171745#comment-13171745
 ] 

Hudson commented on HDFS-2553:
--

Integrated in Hadoop-Common-trunk-Commit #1450 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1450/])
HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed 
by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220317
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java


> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 0.24.0, 0.23.1
>
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171744#comment-13171744
 ] 

Hudson commented on HDFS-2700:
--

Integrated in Hadoop-Common-trunk-Commit #1450 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1450/])
HDFS-2700. Fix failing TestDataNodeMultipleRegistrations in trunk. 
Contributed by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220315
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java


> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 0.24.0
>
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171741#comment-13171741
 ] 

Hudson commented on HDFS-2553:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1523 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1523/])
HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed 
by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220317
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java


> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 0.24.0, 0.23.1
>
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171740#comment-13171740
 ] 

Hudson commented on HDFS-2700:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1523 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1523/])
HDFS-2700. Fix failing TestDataNodeMultipleRegistrations in trunk. 
Contributed by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220315
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java


> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 0.24.0
>
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171743#comment-13171743
 ] 

Hudson commented on HDFS-2553:
--

Integrated in Hadoop-Common-0.23-Commit #304 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/304/])
HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed 
by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220316
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java


> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 0.24.0, 0.23.1
>
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171738#comment-13171738
 ] 

Hudson commented on HDFS-2553:
--

Integrated in Hadoop-Hdfs-0.23-Commit #293 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/293/])
HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed 
by Uma Maheswara Rao G.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220316
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java


> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 0.24.0, 0.23.1
>
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2553.
---

  Resolution: Fixed
   Fix Version/s: 0.23.1
  0.24.0
Target Version/s: 0.24.0, 0.23.1  (was: 0.23.1, 0.24.0)
Hadoop Flags: Reviewed

Committed to trunk and 23. Thanks, Uma!

> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 0.24.0, 0.23.1
>
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2553:
--

Target Version/s: 0.24.0, 0.23.1  (was: 0.23.1, 0.24.0)
  Status: Open  (was: Patch Available)

> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Todd Lipcon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2700:
--

   Resolution: Fixed
Fix Version/s: 0.24.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk, thx Uma.

> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 0.24.0
>
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171732#comment-13171732
 ] 

Todd Lipcon commented on HDFS-2700:
---

+1, thanks for taking care of this Uma. Will commit momentarily.

> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread Scott Carey (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171712#comment-13171712
 ] 

Scott Carey commented on HDFS-2699:
---

That brings up a related question:  Why a 4 byte crc per 512 bytes and not per 
4096 bytes?

512 aligns with the old hard drive block size, the physical media had ECC at 
512 byte blocks and could not read or write in smaller chunks than that.  New 
hard drives all have 4096 byte blocks and ECC at that granularity -- no smaller 
chunk can be read or written.  SSDs use 4096 or 8192 byte blocks these days.

If the physical media is corrupting blocks, these will most likely be corrupted 
in 4k chunks.  A CRC per 4k decreases the checksum overhead by a factor of 8, 
increasing the likelihood of finding it in OS cache if it is in a side file.  
Now that CRC is accelerated by the processor and very fast, I don't think the 
overhead of the larger block CRC for reads smaller than 4k will matter either.

Inlining the CRC could decrease seek and OS pagecache overhead a lot.  Since 
most file systems and OS's work on 4k blocks, HDFS could store a 4 byte crc and 
4092 bytes of data into a single OS / disk page.  (Or, 8 4 byte CRCs and 4064 
bytes in a page)  This has big advantages:  If your data is in the OS 
pagecache, the crc will be too -- one will never be written to disk without the 
other, nor evicted from cache without the other.

> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2

2011-12-17 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171704#comment-13171704
 ] 

Todd Lipcon commented on HDFS-2654:
---

woops, sorry to have missed that. IOUtils.closeStream is better.

> Make BlockReaderLocal not extend RemoteBlockReader2
> ---
>
> Key: HDFS-2654
> URL: https://issues.apache.org/jira/browse/HDFS-2654
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.23.1, 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2654-1.patch, hdfs-2654-2.patch, hdfs-2654-2.patch, 
> hdfs-2654-2.patch, hdfs-2654-3.patch, hdfs-2654-b1-1.patch, 
> hdfs-2654-b1-2.patch, hdfs-2654-b1-3.patch, hdfs-2654-b1-4-fix.patch, 
> hdfs-2654-b1-4.patch
>
>
> The BlockReaderLocal code paths are easier to understand (especially true on 
> branch-1 where BlockReaderLocal inherits code from BlockerReader and 
> FSInputChecker) if the local and remote block reader implementations are 
> independent, and they're not really sharing much code anyway. If for some 
> reason they start to share sifnificant code we can make the BlockReader 
> interface an abstract class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy

2011-12-17 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2335:
--

Affects Version/s: 1.0.0
   0.24.0
   0.23.0

> DataNodeCluster and NNStorage always pull fresh entropy
> ---
>
> Key: HDFS-2335
> URL: https://issues.apache.org/jira/browse/HDFS-2335
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, name-node
>Affects Versions: 0.23.0, 0.24.0, 1.0.0
>Reporter: Eli Collins
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2335.patch, HDFS-2335.patch
>
>
> Jira for giving DataNodeCluster and NNStorage the same treatment as 
> HDFS-1835. They're not truly cryptographic uses as well. We should also 
> factor this out to a utility method, seems like the three uses are slightly 
> different, eg one uses DFSUtil.getRandom and the other creates a new Random 
> object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171698#comment-13171698
 ] 

Todd Lipcon commented on HDFS-2699:
---

+1 on considering putting them in the same file. Block files already have a 
metadata header so we could backward-compatibly support the earlier format 
without requiring a data rewrite on upgrade (prohibitively expensive)

Regarding the other ideas, like caching checksums in buffer cache or on SSD, I 
think the issue here is that the 0.78% overhead (4/512) still makes for fairly 
large checksum size on a big DN. For example, if the application has a dataset 
of 4TB per node, then even caching just the checksums is 31GB of RAM. If you're 
mostly missing HBase's data cache, then you'll probably be missing the checksum 
cache too (are you really going to devote 30G to it?)

> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2654:
--

Attachment: hdfs-2654-b1-4-fix.patch

In hdfs-2654-b1-4.patch I replaced  reader.close with IOUtils.closeStream, 
which is incorrect (reader may be null). Fixing that.

> Make BlockReaderLocal not extend RemoteBlockReader2
> ---
>
> Key: HDFS-2654
> URL: https://issues.apache.org/jira/browse/HDFS-2654
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.23.1, 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2654-1.patch, hdfs-2654-2.patch, hdfs-2654-2.patch, 
> hdfs-2654-2.patch, hdfs-2654-3.patch, hdfs-2654-b1-1.patch, 
> hdfs-2654-b1-2.patch, hdfs-2654-b1-3.patch, hdfs-2654-b1-4-fix.patch, 
> hdfs-2654-b1-4.patch
>
>
> The BlockReaderLocal code paths are easier to understand (especially true on 
> branch-1 where BlockReaderLocal inherits code from BlockerReader and 
> FSInputChecker) if the local and remote block reader implementations are 
> independent, and they're not really sharing much code anyway. If for some 
> reason they start to share sifnificant code we can make the BlockReader 
> interface an abstract class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2679) Add interface to query current state to HAServiceProtocol

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2679:
--

Attachment: hdfs-2679.txt

Updated patch to reflect HADOOP-7925.

> Add interface to query current state to HAServiceProtocol 
> --
>
> Key: HDFS-2679
> URL: https://issues.apache.org/jira/browse/HDFS-2679
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2679.txt, hdfs-2679.txt, hdfs-2679.txt, 
> hdfs-2679.txt, hdfs-2679.txt
>
>
> Let's add an interface to HAServiceProtocol to query the current state of a 
> NameNode for use by the the CLI (HAAdmin) and Web UI (HDFS-2677). This 
> essentially makes the names "active" and "standby" from ACTIVE_STATE and 
> STANDBY_STATE public interfaces, which IMO seems reasonable. Unlike the other 
> APIs we should be able to use the interface even when HA is not enabled (as 
> by default a non-HA NN is active).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2677) HA: Web UI should indicate the NN state

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2677:
--

Attachment: hdfs-2677.txt

Updated patch to reflect HADOOP-7925.

> HA: Web UI should indicate the NN state
> ---
>
> Key: HDFS-2677
> URL: https://issues.apache.org/jira/browse/HDFS-2677
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2677.txt, hdfs-2677.txt, hdfs-2677.txt, 
> hdfs-2677.txt
>
>
> The DFS web UI should indicate whether it's an active or standby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2701) Cleanup FS* processIOError methods

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2701:
--

Attachment: hdfs-2701.txt

Minor update.

> Cleanup FS* processIOError methods
> --
>
> Key: HDFS-2701
> URL: https://issues.apache.org/jira/browse/HDFS-2701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2701.txt, hdfs-2701.txt, hdfs-2701.txt
>
>
> Let's rename the various "processIOError" methods to be more descriptive. The 
> current code makes it difficult to identify and reason about bug fixes. While 
> we're at it let's remove "Fatal" from the "Unable to sync the edit log" log 
> since it's not actually a fatal error (this is confusing to users). And 2NN 
> "Checkpoint done" should be info, not a warning (also confusing to users).
> Thanks to HDFS-1073 these issues don't exist on trunk or 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2702) A single failed name dir can cause the NN to exit

2011-12-17 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171685#comment-13171685
 ] 

Eli Collins commented on HDFS-2702:
---

And verified the NN still exits after all storage directories have failed.

> A single failed name dir can cause the NN to exit 
> --
>
> Key: HDFS-2702
> URL: https://issues.apache.org/jira/browse/HDFS-2702
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Critical
> Attachments: hdfs-2702.txt
>
>
> There's a bug in FSEditLog#rollEditLog which results in the NN process 
> exiting if a single name dir has failed. Here's the relevant code:
> {code}
> close()  // So editStreams.size() is 0 
> foreach edits dir {
>   ..
>   eStream = new ...  // Might get an IOE here
>   editStreams.add(eStream);
> } catch (IOException ioe) {
>   removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
> }
> {code}
> If we get an IOException before we've added two edits streams to the list 
> we'll exit, eg if there's an error processing the 1st name dir we'll exit 
> even if there are 4 valid name dirs. The fix is to move the checking out of 
> removeEditsForStorageDir (nee processIOError) or modify it so it can be 
> disabled in some cases, eg here where we don't yet know how many streams are 
> valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2702) A single failed name dir can cause the NN to exit

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2702:
--

Attachment: hdfs-2702.txt

Patch attached. Depends on HDFS-2701 and HDFS-2703. I verified the NN no longer 
exits after checkpointing when a single dir failed. I'm working on a unit test 
for this jira and 2703 but thought I'd put up a fix for review in the meantime. 
 

> A single failed name dir can cause the NN to exit 
> --
>
> Key: HDFS-2702
> URL: https://issues.apache.org/jira/browse/HDFS-2702
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Critical
> Attachments: hdfs-2702.txt
>
>
> There's a bug in FSEditLog#rollEditLog which results in the NN process 
> exiting if a single name dir has failed. Here's the relevant code:
> {code}
> close()  // So editStreams.size() is 0 
> foreach edits dir {
>   ..
>   eStream = new ...  // Might get an IOE here
>   editStreams.add(eStream);
> } catch (IOException ioe) {
>   removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
> }
> {code}
> If we get an IOException before we've added two edits streams to the list 
> we'll exit, eg if there's an error processing the 1st name dir we'll exit 
> even if there are 4 valid name dirs. The fix is to move the checking out of 
> removeEditsForStorageDir (nee processIOError) or modify it so it can be 
> disabled in some cases, eg here where we don't yet know how many streams are 
> valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir

2011-12-17 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171679#comment-13171679
 ] 

Eli Collins commented on HDFS-2703:
---

This applies atop HDFS-2701.

> removedStorageDirs is not updated everywhere we remove a storage dir
> 
>
> Key: HDFS-2703
> URL: https://issues.apache.org/jira/browse/HDFS-2703
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2703.txt
>
>
> There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) 
> where we remove a storage directory but don't add it to the 
> removedStorageDirs list. This means a storage dir may have been removed but 
> we don't see it in the log or Web UI. This doesn't affect trunk/23 since the 
> code there is totally different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2702) A single failed name dir can cause the NN to exit

2011-12-17 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171678#comment-13171678
 ] 

Eli Collins commented on HDFS-2702:
---

This bug exists in FSEditLog#close as well.

> A single failed name dir can cause the NN to exit 
> --
>
> Key: HDFS-2702
> URL: https://issues.apache.org/jira/browse/HDFS-2702
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Critical
>
> There's a bug in FSEditLog#rollEditLog which results in the NN process 
> exiting if a single name dir has failed. Here's the relevant code:
> {code}
> close()  // So editStreams.size() is 0 
> foreach edits dir {
>   ..
>   eStream = new ...  // Might get an IOE here
>   editStreams.add(eStream);
> } catch (IOException ioe) {
>   removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
> }
> {code}
> If we get an IOException before we've added two edits streams to the list 
> we'll exit, eg if there's an error processing the 1st name dir we'll exit 
> even if there are 4 valid name dirs. The fix is to move the checking out of 
> removeEditsForStorageDir (nee processIOError) or modify it so it can be 
> disabled in some cases, eg here where we don't yet know how many streams are 
> valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2703:
--

Attachment: hdfs-2703.txt

Patch attached. Verified that when the 2NN triggers a roll and a storage 
directory fails a  warning now shows up in the log and the Web UI lists this 
storage directory as failed.

> removedStorageDirs is not updated everywhere we remove a storage dir
> 
>
> Key: HDFS-2703
> URL: https://issues.apache.org/jira/browse/HDFS-2703
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2703.txt
>
>
> There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) 
> where we remove a storage directory but don't add it to the 
> removedStorageDirs list. This means a storage dir may have been removed but 
> we don't see it in the log or Web UI. This doesn't affect trunk/23 since the 
> code there is totally different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2701) Cleanup FS* processIOError methods

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2701:
--

Attachment: hdfs-2701.txt

test-patch results. Just cleanup so existing tests suffice. Will do a full test 
run before committing.

{noformat}
 [exec] 
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
{noformat}

> Cleanup FS* processIOError methods
> --
>
> Key: HDFS-2701
> URL: https://issues.apache.org/jira/browse/HDFS-2701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2701.txt, hdfs-2701.txt
>
>
> Let's rename the various "processIOError" methods to be more descriptive. The 
> current code makes it difficult to identify and reason about bug fixes. While 
> we're at it let's remove "Fatal" from the "Unable to sync the edit log" log 
> since it's not actually a fatal error (this is confusing to users). And 2NN 
> "Checkpoint done" should be info, not a warning (also confusing to users).
> Thanks to HDFS-1073 these issues don't exist on trunk or 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2704) NameNodeResouceChecker#checkAvailableResources should check for inodes

2011-12-17 Thread Eli Collins (Created) (JIRA)
NameNodeResouceChecker#checkAvailableResources should check for inodes
--

 Key: HDFS-2704
 URL: https://issues.apache.org/jira/browse/HDFS-2704
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.24.0
Reporter: Eli Collins


NameNodeResouceChecker#checkAvailableResources currently just checks for free 
space. However inodes are also a file system resource that needs to be 
available (you can run out of inodes but have free space).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread Allen Wittenauer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171660#comment-13171660
 ] 

Allen Wittenauer commented on HDFS-2699:


Would it make sense to make the equivalent of a logging device instead?  In 
other words, put the meta files on a dedicated fast/small disk to segregate 
them from the actual data blocks?  Besides just being able to pick a better 
storage medium, it might potentially allow for better caching strategies at the 
OS level (depending upon the OS of course).

> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2702) A single failed name dir can cause the NN to exit

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2702:
--

 Target Version/s: 1.1.0
Affects Version/s: (was: 0.20.205.0)
   1.1.0
Fix Version/s: (was: 1.1.0)

> A single failed name dir can cause the NN to exit 
> --
>
> Key: HDFS-2702
> URL: https://issues.apache.org/jira/browse/HDFS-2702
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Critical
>
> There's a bug in FSEditLog#rollEditLog which results in the NN process 
> exiting if a single name dir has failed. Here's the relevant code:
> {code}
> close()  // So editStreams.size() is 0 
> foreach edits dir {
>   ..
>   eStream = new ...  // Might get an IOE here
>   editStreams.add(eStream);
> } catch (IOException ioe) {
>   removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
> }
> {code}
> If we get an IOException before we've added two edits streams to the list 
> we'll exit, eg if there's an error processing the 1st name dir we'll exit 
> even if there are 4 valid name dirs. The fix is to move the checking out of 
> removeEditsForStorageDir (nee processIOError) or modify it so it can be 
> disabled in some cases, eg here where we don't yet know how many streams are 
> valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2701) Cleanup FS* processIOError methods

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2701:
--

 Target Version/s: 1.1.0
Affects Version/s: (was: 0.20.205.0)
   1.0.0
Fix Version/s: (was: 1.1.0)

> Cleanup FS* processIOError methods
> --
>
> Key: HDFS-2701
> URL: https://issues.apache.org/jira/browse/HDFS-2701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2701.txt
>
>
> Let's rename the various "processIOError" methods to be more descriptive. The 
> current code makes it difficult to identify and reason about bug fixes. While 
> we're at it let's remove "Fatal" from the "Unable to sync the edit log" log 
> since it's not actually a fatal error (this is confusing to users). And 2NN 
> "Checkpoint done" should be info, not a warning (also confusing to users).
> Thanks to HDFS-1073 these issues don't exist on trunk or 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir

2011-12-17 Thread Eli Collins (Created) (JIRA)
removedStorageDirs is not updated everywhere we remove a storage dir


 Key: HDFS-2703
 URL: https://issues.apache.org/jira/browse/HDFS-2703
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins


There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) 
where we remove a storage directory but don't add it to the removedStorageDirs 
list. This means a storage dir may have been removed but we don't see it in the 
log or Web UI. This doesn't affect trunk/23 since the code there is totally 
different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2702) A single failed name dir can cause the NN to exit

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2702:
--

Affects Version/s: (was: 1.1.0)
   1.0.0

> A single failed name dir can cause the NN to exit 
> --
>
> Key: HDFS-2702
> URL: https://issues.apache.org/jira/browse/HDFS-2702
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Critical
>
> There's a bug in FSEditLog#rollEditLog which results in the NN process 
> exiting if a single name dir has failed. Here's the relevant code:
> {code}
> close()  // So editStreams.size() is 0 
> foreach edits dir {
>   ..
>   eStream = new ...  // Might get an IOE here
>   editStreams.add(eStream);
> } catch (IOException ioe) {
>   removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
> }
> {code}
> If we get an IOException before we've added two edits streams to the list 
> we'll exit, eg if there's an error processing the 1st name dir we'll exit 
> even if there are 4 valid name dirs. The fix is to move the checking out of 
> removeEditsForStorageDir (nee processIOError) or modify it so it can be 
> disabled in some cases, eg here where we don't yet know how many streams are 
> valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2702) A single failed name dir can cause the NN to exit

2011-12-17 Thread Eli Collins (Created) (JIRA)
A single failed name dir can cause the NN to exit 
--

 Key: HDFS-2702
 URL: https://issues.apache.org/jira/browse/HDFS-2702
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.205.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Critical
 Fix For: 1.1.0


There's a bug in FSEditLog#rollEditLog which results in the NN process exiting 
if a single name dir has failed. Here's the relevant code:

{code}
close()  // So editStreams.size() is 0 
foreach edits dir {
  ..
  eStream = new ...  // Might get an IOE here
  editStreams.add(eStream);
} catch (IOException ioe) {
  removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
}
{code}

If we get an IOException before we've added two edits streams to the list we'll 
exit, eg if there's an error processing the 1st name dir we'll exit even if 
there are 4 valid name dirs. The fix is to move the checking out of 
removeEditsForStorageDir (nee processIOError) or modify it so it can be 
disabled in some cases, eg here where we don't yet know how many streams are 
valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171657#comment-13171657
 ] 

Andrew Purtell commented on HDFS-2699:
--

@Dhruba, yes I agree with you fully. From the HBase point of view optimizing 
IOPS in HDFS is very important.

> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2701) Cleanup FS* processIOError methods

2011-12-17 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2701:
--

Attachment: hdfs-2701.txt

Patch attached. Running test-patch now.

> Cleanup FS* processIOError methods
> --
>
> Key: HDFS-2701
> URL: https://issues.apache.org/jira/browse/HDFS-2701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.20.205.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 1.1.0
>
> Attachments: hdfs-2701.txt
>
>
> Let's rename the various "processIOError" methods to be more descriptive. The 
> current code makes it difficult to identify and reason about bug fixes. While 
> we're at it let's remove "Fatal" from the "Unable to sync the edit log" log 
> since it's not actually a fatal error (this is confusing to users). And 2NN 
> "Checkpoint done" should be info, not a warning (also confusing to users).
> Thanks to HDFS-1073 these issues don't exist on trunk or 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171653#comment-13171653
 ] 

dhruba borthakur commented on HDFS-2699:


Hi andrew, all of the points you mentioned are valid points that could decrease 
the amount of iops needed for a particular workload. But my point is that if we 
keep the other pieces constant (amount of ram, amount of flash, etc), then what 
can we do to reduce iops for the same workload. If the machine has more RAM 
memory, I would rather give all of it to the hbase block cache, because 
accesess from the hbase block cache is more optimal that accessing the file 
system cache. The hbase block cache can do better caching policies (because it 
is closer to the application) than the OS file cache (I am making the same 
arguments why databases typically do unbuffered io from the filesytem).

Most disks are getting larger and larger in size (4TB disks coming next year), 
but the iops per spindle has not changed much. Given that, an efficient storage 
system should strive to optimize on iops, is it not?


> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2701) Cleanup FS* processIOError methods

2011-12-17 Thread Eli Collins (Created) (JIRA)
Cleanup FS* processIOError methods
--

 Key: HDFS-2701
 URL: https://issues.apache.org/jira/browse/HDFS-2701
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.20.205.0
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 1.1.0


Let's rename the various "processIOError" methods to be more descriptive. The 
current code makes it difficult to identify and reason about bug fixes. While 
we're at it let's remove "Fatal" from the "Unable to sync the edit log" log 
since it's not actually a fatal error (this is confusing to users). And 2NN 
"Checkpoint done" should be info, not a warning (also confusing to users).

Thanks to HDFS-1073 these issues don't exist on trunk or 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171643#comment-13171643
 ] 

Andrew Purtell commented on HDFS-2699:
--

IMHO, this is a design evolution question for HDFS. Is pread a first class use 
case? How many clients beyond HBase?

If so, I think it makes sense to consider changes to DN storage that reduce 
IOPS.

If not and/or if changes to DN storage are too radical by consensus, then a 
means to optionally fadvise away data file pages seems worthwhile to try. There 
are other considerations that suggest deployments should use a reasonable 
amount of RAM, this will be available in part for OS blockcache.

There are other various alternatives: application level checksums, mixed device 
deployment (flash + disk), etc. Given the above two options, it may be a 
distraction to consider more options unless there is a compelling reason. (For 
example, optimizing IOPS for disk provides the same benefit for flash devices.)

> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171638#comment-13171638
 ] 

Uma Maheswara Rao G commented on HDFS-2700:
---

Test failures are unrelated to this patch. Already raised an issue for that 
failures HDFS-2657. 
Findbugs and javadoc comments also unrelated.

> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171635#comment-13171635
 ] 

Hadoop QA commented on HDFS-2700:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12507785/HDFS-2700.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated 90 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests:
  org.apache.hadoop.fs.http.server.TestHttpFSServer
  org.apache.hadoop.lib.servlet.TestServerWebApp

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1722//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1722//console

This message is automatically generated.

> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-554) BlockInfo.ensureCapacity may get a speedup from System.arraycopy()

2011-12-17 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171622#comment-13171622
 ] 

Aaron T. Myers commented on HDFS-554:
-

The patch looks good to me. +1 pending clean Jenkins results.

> BlockInfo.ensureCapacity may get a speedup from System.arraycopy()
> --
>
> Key: HDFS-554
> URL: https://issues.apache.org/jira/browse/HDFS-554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Steve Loughran
>Assignee: Harsh J
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-554.patch, HDFS-554.txt
>
>
> BlockInfo.ensureCapacity() uses a for() loop to copy the old array data into 
> the expanded array.  {{System.arraycopy()}} is generally much faster for 
> this, as it can do a bulk memory copy. There is also the typesafe Java6 
> {{Arrays.copyOf()}} to consider, though here it offers no tangible benefit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2700:
--

Issue Type: Bug  (was: Test)

> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2700:
--

Assignee: Uma Maheswara Rao G
  Status: Patch Available  (was: Open)

> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2700:
--

Attachment: HDFS-2700.patch

> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Uma Maheswara Rao G
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171616#comment-13171616
 ] 

Uma Maheswara Rao G commented on HDFS-2700:
---

Here the problem is DatanodeProtocolClientSideTranslatorPB is not proxy 
instance directly. This is wrapper around the proxy instance of 
DatanodeProtocolPB.

So, shutting down the previos cluster, the clinets are not getting cleared and 
they will be cached. When new cluster starts it may get old invalid clients and 
getting EOFEceptions.

So, we should just call close of DatanodeProtocolClientSideTranslatorPB. That 
will call the RPC.stopProxy by passing the real proxy instance 
(DatanodeProtocolPB).

Attached the patch by closing the DatanodeProtocolClientSideTranslatorPB.


Thanks
Uma

> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Uma Maheswara Rao G
> Attachments: HDFS-2700.patch
>
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-554) BlockInfo.ensureCapacity may get a speedup from System.arraycopy()

2011-12-17 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-554:
-

Attachment: HDFS-554.txt

Thanks for that catch Todd, you're right :)

> BlockInfo.ensureCapacity may get a speedup from System.arraycopy()
> --
>
> Key: HDFS-554
> URL: https://issues.apache.org/jira/browse/HDFS-554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Steve Loughran
>Assignee: Harsh J
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-554.patch, HDFS-554.txt
>
>
> BlockInfo.ensureCapacity() uses a for() loop to copy the old array data into 
> the expanded array.  {{System.arraycopy()}} is generally much faster for 
> this, as it can do a bulk memory copy. There is also the typesafe Java6 
> {{Arrays.copyOf()}} to consider, though here it offers no tangible benefit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171607#comment-13171607
 ] 

Uma Maheswara Rao G commented on HDFS-2700:
---

Some more information:

Looks nor able clean the some of the proxy instances

BP-239265342-67.195.138.31-1324137379885 (storage id 
DS-47228547-67.195.138.31-49285-1324137385405) registered with 
localhost/127.0.0.1:9930
2011-12-17 15:56:26,248 ERROR ipc.RPC (RPC.java:stopProxy(559)) - Tried to call 
RPC.stopProxy on an object that is not a proxy.
java.lang.IllegalArgumentException: not a proxy instance
at java.lang.reflect.Proxy.getInvocationHandler(Proxy.java:637)
at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:557)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.cleanUp(BPOfferService.java:450)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.run(BPOfferService.java:639)
at java.lang.Thread.run(Thread.java:662)
2011-12-17 15:56:26,248 ERROR ipc.RPC (RPC.java:stopProxy(559)) - Tried to call 
RPC.stopProxy on an object that is not a proxy.
java.lang.IllegalArgumentException: not a proxy instance
at java.lang.reflect.Proxy.getInvocationHandler(Proxy.java:637)
at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:557)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.cleanUp(BPOfferService.java:450)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.run(BPOfferService.java:639)
at java.lang.Thread.run(Thread.java:662)


> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Uma Maheswara Rao G
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171604#comment-13171604
 ] 

dhruba borthakur commented on HDFS-2699:


Another option that I am going to try is to fadvise away pages from data files 
(because those are anyways cached in the hbase cache) so that more file system 
cache is available to cache data from checksum files. Do people think this is a 
good idea?


> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171601#comment-13171601
 ] 

dhruba borthakur commented on HDFS-2699:


There are various alternatie like you proposed. The advantage of an application 
level checksum (at the hbase block level) sounds easy to do. The disadvantage 
is that hdfs still have to do generate/store checksums to periodically validate 
data that is not accessed for a long time.

> by supporting two block format simultaneously at the expense of code 
> complexity
are u saying that the same data is stored in two places? One is the current 
format and another is the format with inline checksums?



> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171599#comment-13171599
 ] 

Uma Maheswara Rao G commented on HDFS-2553:
---

{quote}
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
{quote}
Test failure is unrelated to this patch. Filed separate JIRA for it HDFS-2700

> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171598#comment-13171598
 ] 

Hadoop QA commented on HDFS-2553:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12507782/HDFS-2553.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated 90 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests:
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1721//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1721//console

This message is automatically generated.

> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Uma Maheswara Rao G (Created) (JIRA)
TestDataNodeMultipleRegistrations is failing in trunk
-

 Key: HDFS-2700
 URL: https://issues.apache.org/jira/browse/HDFS-2700
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Uma Maheswara Rao G


TestDataNodeMultipleRegistrations  is failing from last couple of builds
https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk

2011-12-17 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171596#comment-13171596
 ] 

Uma Maheswara Rao G commented on HDFS-2700:
---

more info:
java.io.IOException: Failed on local exception: java.io.EOFException; Host 
Details : local host is: "asf001.sp2.ygridcore.net/67.195.138.31"; destination 
host is: ""localhost":9929; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
at org.apache.hadoop.ipc.Client.call(Client.java:1140)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:169)
at $Proxy14.getDatanodeReport(Unknown Source)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:127)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:81)
at $Proxy14.getDatanodeReport(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDatanodeReport(ClientNamenodeProtocolTranslatorPB.java:555)
at org.apache.hadoop.hdfs.DFSClient.datanodeReport(DFSClient.java:1443)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:1486)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.addNameNode(MiniDFSCluster.java:1904)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations.testMiniDFSClusterWithMultipleNN(TestDataNodeMultipleRegistrations.java:237)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)


> TestDataNodeMultipleRegistrations is failing in trunk
> -
>
> Key: HDFS-2700
> URL: https://issues.apache.org/jira/browse/HDFS-2700
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Uma Maheswara Rao G
>
> TestDataNodeMultipleRegistrations  is failing from last couple of builds
> https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2553:
--

Attachment: HDFS-2553.patch

Attached the same patch again to trigger the Jenkins

> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2553) BlockPoolSliceScanner spinning in loop

2011-12-17 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2553:
--

Attachment: (was: HDFS-2553.patch)

> BlockPoolSliceScanner spinning in loop
> --
>
> Key: HDFS-2553
> URL: https://issues.apache.org/jira/browse/HDFS-2553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: CPUUsage.jpg, HDFS-2553.patch
>
>
> Playing with trunk, I managed to get a DataNode in a situation where the 
> BlockPoolSliceScanner is spinning in the following loop, using 100% CPU:
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2486) Review issues with UnderReplicatedBlocks

2011-12-17 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171586#comment-13171586
 ] 

Hadoop QA commented on HDFS-2486:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12507779/HDFS-2486.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated 90 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests:
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1720//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1720//console

This message is automatically generated.

> Review issues with UnderReplicatedBlocks
> 
>
> Key: HDFS-2486
> URL: https://issues.apache.org/jira/browse/HDFS-2486
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Steve Loughran
>Assignee: Uma Maheswara Rao G
>Priority: Minor
> Attachments: HDFS-2486.patch
>
>
> Here are some things I've noted in the UnderReplicatedBlocks class that 
> someone else should review and consider if the code is correct. If not, they 
> are easy to fix.
> remove(Block block, int priLevel) is not synchronized, and as the inner 
> classes are not, there is a risk of race conditions there.
> some of the code assumes that getPriority can return the value LEVEL, and if 
> so does not attempt to queue the blocks. As this return value is not 
> currently possible, those checks can be removed. 
> The queue gives priority to blocks whose replication count is less than a 
> third of its expected count over those that are "normally under replicated". 
> While this is good for ensuring that files scheduled for large replication 
> are replicated fast, it may not be the best strategy for maintaining data 
> integrity. For that it may be better to give whichever blocks have only two 
> replicas priority over blocks that may, for example, already have 3 out of 10 
> copies in the filesystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2362) More Improvements on NameNode Scalability

2011-12-17 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171582#comment-13171582
 ] 

Uma Maheswara Rao G commented on HDFS-2362:
---

Can some please merge this improvements to 0.23 versions as well. Because this 
introduced a good amount of delta between trunk and 0.23 version. So, we are 
not able to do direct merges of some other improvements like HDFS-1765.

> More Improvements on NameNode Scalability
> -
>
> Key: HDFS-2362
> URL: https://issues.apache.org/jira/browse/HDFS-2362
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Hairong Kuang
>
> This jira acts as an umbrella jira to track all the improvements we've done 
> recently to improve Namenode's performance, responsiveness, and hence 
> scalability. Those improvements include:
> 1. Incremental block reports (HDFS-395)
> 2. BlockManager.reportDiff optimization for processing block reports 
> (HDFS-2477)
> 3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
> progress in processing block reports (HDFS-2490)
> 4. More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks (HDFS-2476)
> 5. Increase granularity of write operations in ReplicationMonitor thus 
> reducing contention for write lock (HDFS-2495)
> 6. Support variable block sizes
> 7. Release RPC handlers while waiting for edit log is synced to disk
> 8. Reduce network traffic pressure to the master rack where NN is located by 
> lowering read priority of the replicas on the rack
> 9. A standalone KeepAlive heartbeat thread
> 10. Reduce Multiple traversals of path directory to one for most namespace 
> manipulations
> 11. Move logging out of write lock section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2486) Review issues with UnderReplicatedBlocks

2011-12-17 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171577#comment-13171577
 ] 

Uma Maheswara Rao G commented on HDFS-2486:
---

This patch addresses the second comment. After concluding the last comment, 
that can be handled as separate JIRA(subtask), if required and also depending 
on code change.

> Review issues with UnderReplicatedBlocks
> 
>
> Key: HDFS-2486
> URL: https://issues.apache.org/jira/browse/HDFS-2486
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Steve Loughran
>Assignee: Uma Maheswara Rao G
>Priority: Minor
> Attachments: HDFS-2486.patch
>
>
> Here are some things I've noted in the UnderReplicatedBlocks class that 
> someone else should review and consider if the code is correct. If not, they 
> are easy to fix.
> remove(Block block, int priLevel) is not synchronized, and as the inner 
> classes are not, there is a risk of race conditions there.
> some of the code assumes that getPriority can return the value LEVEL, and if 
> so does not attempt to queue the blocks. As this return value is not 
> currently possible, those checks can be removed. 
> The queue gives priority to blocks whose replication count is less than a 
> third of its expected count over those that are "normally under replicated". 
> While this is good for ensuring that files scheduled for large replication 
> are replicated fast, it may not be the best strategy for maintaining data 
> integrity. For that it may be better to give whichever blocks have only two 
> replicas priority over blocks that may, for example, already have 3 out of 10 
> copies in the filesystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2486) Review issues with UnderReplicatedBlocks

2011-12-17 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2486:
--

Assignee: Uma Maheswara Rao G
Target Version/s: 0.24.0
  Status: Patch Available  (was: Open)

removed the unnecessary priLevel != LEVEL checks from UnderReplicated class as 
getPriority API will never return LEVEL (5) .
Patch mainly tested with:
TestReplication: this can verify the update api
TestReplicationPolicy: this will verify add functionality.
Normal other tests not impacted due to this change.





> Review issues with UnderReplicatedBlocks
> 
>
> Key: HDFS-2486
> URL: https://issues.apache.org/jira/browse/HDFS-2486
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Steve Loughran
>Assignee: Uma Maheswara Rao G
>Priority: Minor
> Attachments: HDFS-2486.patch
>
>
> Here are some things I've noted in the UnderReplicatedBlocks class that 
> someone else should review and consider if the code is correct. If not, they 
> are easy to fix.
> remove(Block block, int priLevel) is not synchronized, and as the inner 
> classes are not, there is a risk of race conditions there.
> some of the code assumes that getPriority can return the value LEVEL, and if 
> so does not attempt to queue the blocks. As this return value is not 
> currently possible, those checks can be removed. 
> The queue gives priority to blocks whose replication count is less than a 
> third of its expected count over those that are "normally under replicated". 
> While this is good for ensuring that files scheduled for large replication 
> are replicated fast, it may not be the best strategy for maintaining data 
> integrity. For that it may be better to give whichever blocks have only two 
> replicas priority over blocks that may, for example, already have 3 out of 10 
> copies in the filesystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2486) Review issues with UnderReplicatedBlocks

2011-12-17 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2486:
--

Attachment: HDFS-2486.patch

> Review issues with UnderReplicatedBlocks
> 
>
> Key: HDFS-2486
> URL: https://issues.apache.org/jira/browse/HDFS-2486
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: HDFS-2486.patch
>
>
> Here are some things I've noted in the UnderReplicatedBlocks class that 
> someone else should review and consider if the code is correct. If not, they 
> are easy to fix.
> remove(Block block, int priLevel) is not synchronized, and as the inner 
> classes are not, there is a risk of race conditions there.
> some of the code assumes that getPriority can return the value LEVEL, and if 
> so does not attempt to queue the blocks. As this return value is not 
> currently possible, those checks can be removed. 
> The queue gives priority to blocks whose replication count is less than a 
> third of its expected count over those that are "normally under replicated". 
> While this is good for ensuring that files scheduled for large replication 
> are replicated fast, it may not be the best strategy for maintaining data 
> integrity. For that it may be better to give whichever blocks have only two 
> replicas priority over blocks that may, for example, already have 3 out of 10 
> copies in the filesystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2640) Javadoc generation hangs

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171569#comment-13171569
 ] 

Hudson commented on HDFS-2640:
--

Integrated in Hadoop-Mapreduce-trunk #930 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/930/])
HDFS-2640. Javadoc generation hangs.

tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215354
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml


> Javadoc generation hangs
> 
>
> Key: HDFS-2640
> URL: https://issues.apache.org/jira/browse/HDFS-2640
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tom White
>Assignee: Tom White
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2640.patch
>
>
> Typing 'mvn javadoc:javadoc' causes the build to hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2694) Removal of Avro broke non-PB NN services

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171565#comment-13171565
 ] 

Hudson commented on HDFS-2694:
--

Integrated in Hadoop-Mapreduce-trunk #930 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/930/])
HDFS-2694. Removal of Avro broke non-PB NN services. Contributed by Aaron 
T. Myers.

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215364
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java


> Removal of Avro broke non-PB NN services
> 
>
> Key: HDFS-2694
> URL: https://issues.apache.org/jira/browse/HDFS-2694
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 0.24.0
>
> Attachments: HDFS-2694.patch, HDFS-2694.txt, HDFS-2694.txt
>
>
> RpcEngine implementations have to register themselves associated with an 
> RpcKind. Both WritableRpcEngine and ProtobufRpcEngine do this registration in 
> static initialization blocks. It used to be that the static initializer block 
> for WritableRpcEngine was triggered when AvroRpcEngine was initialized, since 
> this instantiated a WritableRpcEngine object. With AvroRpcEngine gone, 
> there's nothing in the NN to trigger the WritableRpcEngine static 
> initialization block. Therefore, those RPC services which still use Writable 
> and not PB no longer work.
> For example, if I try to run `hdfs groups' on trunk, which uses the 
> GetUserMappingsProtocol, this error gets spit out:
> {noformat}
> $ hdfs groups
> log4j:ERROR Could not find value for key log4j.appender.NullAppender
> log4j:ERROR Could not instantiate appender named "NullAppender".
> Exception in thread "main" java.io.IOException: Unknown rpc kind RPC_WRITABLE
>   at org.apache.hadoop.ipc.Client.call(Client.java:1136)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:205)
>   at $Proxy6.getGroupsForUser(Unknown Source)
>   at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>   at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:56)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2687) Tests are failing with ClassCastException, due to new protocol changes

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171566#comment-13171566
 ] 

Hudson commented on HDFS-2687:
--

Integrated in Hadoop-Mapreduce-trunk #930 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/930/])
HDFS-2687. Tests failing with ClassCastException post protobuf RPC changes. 
Contributed by Suresh Srinivas.

suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215366
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java


> Tests are failing with ClassCastException, due to new protocol changes 
> ---
>
> Key: HDFS-2687
> URL: https://issues.apache.org/jira/browse/HDFS-2687
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Assignee: Suresh Srinivas
> Attachments: HDFS-2687.txt
>
>
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/lastCompletedBuild/testReport/
> java.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.HdfsFileStatus 
> cannot be cast to org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$1.hasNext(DistributedFileSystem.java:452)
>   at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:1551)
>   at org.apache.hadoop.fs.FileSystem$5.next(FileSystem.java:1581)
>   at org.apache.hadoop.fs.FileSystem$5.next(FileSystem.java:1541)
>   at 
> org.apache.hadoop.fs.TestListFiles.testDirectory(TestListFiles.java:146)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2640) Javadoc generation hangs

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171557#comment-13171557
 ] 

Hudson commented on HDFS-2640:
--

Integrated in Hadoop-Mapreduce-0.23-Build #130 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/130/])
Merge -r 1215353:1215354 from trunk to branch-0.23. Fixes: HDFS-2640.

tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215355
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/pom.xml


> Javadoc generation hangs
> 
>
> Key: HDFS-2640
> URL: https://issues.apache.org/jira/browse/HDFS-2640
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tom White
>Assignee: Tom White
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2640.patch
>
>
> Typing 'mvn javadoc:javadoc' causes the build to hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2640) Javadoc generation hangs

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171550#comment-13171550
 ] 

Hudson commented on HDFS-2640:
--

Integrated in Hadoop-Hdfs-0.23-Build #110 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/110/])
Merge -r 1215353:1215354 from trunk to branch-0.23. Fixes: HDFS-2640.

tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215355
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/pom.xml


> Javadoc generation hangs
> 
>
> Key: HDFS-2640
> URL: https://issues.apache.org/jira/browse/HDFS-2640
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tom White
>Assignee: Tom White
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2640.patch
>
>
> Typing 'mvn javadoc:javadoc' causes the build to hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2684) Fix up some failing unit tests on HA branch

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171547#comment-13171547
 ] 

Hudson commented on HDFS-2684:
--

Integrated in Hadoop-Hdfs-HAbranch-build #19 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/19/])
HDFS-2684. Fix up some failing unit tests on HA branch. Contributed by Todd 
Lipcon.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215241
Files : 
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestHeartbeatHandling.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java


> Fix up some failing unit tests on HA branch
> ---
>
> Key: HDFS-2684
> URL: https://issues.apache.org/jira/browse/HDFS-2684
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, test
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2684.txt, hdfs-2684.txt, hdfs-2684.txt
>
>
> To keep moving quickly on the HA branch, we've committed some stuff even 
> though some unit tests are failing. This JIRA is to take a pass through the 
> failing unit tests and get back to green (or close to it). If anything turns 
> out to be a major amount of work I'll file separate JIRAs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2640) Javadoc generation hangs

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171545#comment-13171545
 ] 

Hudson commented on HDFS-2640:
--

Integrated in Hadoop-Hdfs-trunk #897 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/897/])
HDFS-2640. Javadoc generation hangs.

tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215354
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml


> Javadoc generation hangs
> 
>
> Key: HDFS-2640
> URL: https://issues.apache.org/jira/browse/HDFS-2640
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tom White
>Assignee: Tom White
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2640.patch
>
>
> Typing 'mvn javadoc:javadoc' causes the build to hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2687) Tests are failing with ClassCastException, due to new protocol changes

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171542#comment-13171542
 ] 

Hudson commented on HDFS-2687:
--

Integrated in Hadoop-Hdfs-trunk #897 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/897/])
HDFS-2687. Tests failing with ClassCastException post protobuf RPC changes. 
Contributed by Suresh Srinivas.

suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215366
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java


> Tests are failing with ClassCastException, due to new protocol changes 
> ---
>
> Key: HDFS-2687
> URL: https://issues.apache.org/jira/browse/HDFS-2687
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Assignee: Suresh Srinivas
> Attachments: HDFS-2687.txt
>
>
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/lastCompletedBuild/testReport/
> java.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.HdfsFileStatus 
> cannot be cast to org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$1.hasNext(DistributedFileSystem.java:452)
>   at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:1551)
>   at org.apache.hadoop.fs.FileSystem$5.next(FileSystem.java:1581)
>   at org.apache.hadoop.fs.FileSystem$5.next(FileSystem.java:1541)
>   at 
> org.apache.hadoop.fs.TestListFiles.testDirectory(TestListFiles.java:146)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2694) Removal of Avro broke non-PB NN services

2011-12-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171541#comment-13171541
 ] 

Hudson commented on HDFS-2694:
--

Integrated in Hadoop-Hdfs-trunk #897 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/897/])
HDFS-2694. Removal of Avro broke non-PB NN services. Contributed by Aaron 
T. Myers.

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215364
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java


> Removal of Avro broke non-PB NN services
> 
>
> Key: HDFS-2694
> URL: https://issues.apache.org/jira/browse/HDFS-2694
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 0.24.0
>
> Attachments: HDFS-2694.patch, HDFS-2694.txt, HDFS-2694.txt
>
>
> RpcEngine implementations have to register themselves associated with an 
> RpcKind. Both WritableRpcEngine and ProtobufRpcEngine do this registration in 
> static initialization blocks. It used to be that the static initializer block 
> for WritableRpcEngine was triggered when AvroRpcEngine was initialized, since 
> this instantiated a WritableRpcEngine object. With AvroRpcEngine gone, 
> there's nothing in the NN to trigger the WritableRpcEngine static 
> initialization block. Therefore, those RPC services which still use Writable 
> and not PB no longer work.
> For example, if I try to run `hdfs groups' on trunk, which uses the 
> GetUserMappingsProtocol, this error gets spit out:
> {noformat}
> $ hdfs groups
> log4j:ERROR Could not find value for key log4j.appender.NullAppender
> log4j:ERROR Could not instantiate appender named "NullAppender".
> Exception in thread "main" java.io.IOException: Unknown rpc kind RPC_WRITABLE
>   at org.apache.hadoop.ipc.Client.call(Client.java:1136)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:205)
>   at $Proxy6.getGroupsForUser(Unknown Source)
>   at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>   at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:56)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread Luke Lu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171539#comment-13171539
 ] 

Luke Lu commented on HDFS-2699:
---

bq. You'd need HFile v3 for this

Or come up new compression codecs for compression codecs (including none) that 
don't have checksums.

> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread Luke Lu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171531#comment-13171531
 ] 

Luke Lu commented on HDFS-2699:
---

bq. The number of random reads issued by HBase is almost twice the iops shown 
via iostat. Each hbase random io translates to a position read (pread) to HDFS.

As I mentioned in our last conversation, you can embed an application level 
checksum in HBase block (a la Hypertable) and turn off verifyChecksum in 
preads. You'd need HFile v3 for this, of course :)

bq. Any thoughts on how we can put data and checksums together on the same 
block file?

As discussed in HADOOP-1134, inline checksums not only makes the code more 
complex, but also makes inplace upgrade a lot more expensive (you have to copy 
the content). We can solve the latter by supporting two block format 
simultaneously at the expense of code complexity.


> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171493#comment-13171493
 ] 

dhruba borthakur commented on HDFS-2699:


The number of random reads issued by HBase is almost twice the iops shown via 
iostat. Each hbase random io translates to a position read (pread) to HDFS. 

In my workload, hbase is issuing 300 pread/sec. The iostat on the machine shows 
600 reads/sec. I switched off "verifyChecksum" in the pread calls, and that 
reduces the iops (via iostats) to about 350/sec, thus validating the claim that 
storing data and checksum in two different files is very costly for an iops 
bound workload.

Any thoughts on how we can put data and checksums together on the same block 
file?

> Store data and checksums together in block file
> ---
>
> Key: HDFS-2699
> URL: https://issues.apache.org/jira/browse/HDFS-2699
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2699) Store data and checksums together in block file

2011-12-17 Thread dhruba borthakur (Created) (JIRA)
Store data and checksums together in block file
---

 Key: HDFS-2699
 URL: https://issues.apache.org/jira/browse/HDFS-2699
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur


The current implementation of HDFS stores the data in one block file and the 
metadata(checksum) in another block file. This means that every read from HDFS 
actually consumes two disk iops, one to the datafile and one to the checksum 
file. This is a major problem for scaling HBase, because HBase is usually  
bottlenecked on the number of random disk iops that the storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira