[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Nigel Daley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047039#comment-13047039
 ] 

Nigel Daley commented on HDFS-941:
--

+1 for 0.22.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1409) The "register" method of the BackupNode class should be "UnsupportedActionException("register")"

2011-06-09 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047016#comment-13047016
 ] 

Konstantin Shvachko commented on HDFS-1409:
---

+1 Looks good

> The "register" method of the BackupNode class should be 
> "UnsupportedActionException("register")"
> 
>
> Key: HDFS-1409
> URL: https://issues.apache.org/jira/browse/HDFS-1409
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Ching-Shen Chen
>Priority: Trivial
> Fix For: 0.21.1
>
> Attachments: HDFS-1409.patch, HDFS-1409.patch
>
>
> The register method of the BackupNode class should be 
> "UnsupportedActionException("register")" rather than  
> "UnsupportedActionException("journal")".

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-06-09 Thread Hari A V (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046987#comment-13046987
 ] 

Hari A V commented on HDFS-1973:


Hi Aron,

Thanks for the answer. I will watch these issues to get more information :-) 

-Hari

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2055) Add hflush support to libhdfs

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046979#comment-13046979
 ] 

Hadoop QA commented on HDFS-2055:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12482015/HDFS-2055.patch
  against trunk revision 1134170.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/759//console

This message is automatically generated.

> Add hflush support to libhdfs
> -
>
> Key: HDFS-2055
> URL: https://issues.apache.org/jira/browse/HDFS-2055
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: libhdfs
>Reporter: Travis Crawford
> Attachments: HDFS-2055.patch
>
>
> libhdfs would be improved by adding support for hflush.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2055) Add hflush support to libhdfs

2011-06-09 Thread Travis Crawford (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Travis Crawford updated HDFS-2055:
--

Release Note: Add hdfsHFlush to libhdfs.
  Status: Patch Available  (was: Open)

> Add hflush support to libhdfs
> -
>
> Key: HDFS-2055
> URL: https://issues.apache.org/jira/browse/HDFS-2055
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: libhdfs
>Reporter: Travis Crawford
> Attachments: HDFS-2055.patch
>
>
> libhdfs would be improved by adding support for hflush.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2055) Add hflush support to libhdfs

2011-06-09 Thread Travis Crawford (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Travis Crawford updated HDFS-2055:
--

Attachment: HDFS-2055.patch

Add {{hdfsHFlush}} to libhdfs.

Its also viewable here which might be easier to read:

https://github.com/traviscrawford/hadoop-hdfs/compare/apache:trunk...HDFS-2055_Add_hflush_support_to_libhdfs

> Add hflush support to libhdfs
> -
>
> Key: HDFS-2055
> URL: https://issues.apache.org/jira/browse/HDFS-2055
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: libhdfs
>Reporter: Travis Crawford
> Attachments: HDFS-2055.patch
>
>
> libhdfs would be improved by adding support for hflush.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2056) Update fetchdt usage

2011-06-09 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-2056:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I have committed this. Thanks to Tanping!

> Update fetchdt usage
> 
>
> Key: HDFS-2056
> URL: https://issues.apache.org/jira/browse/HDFS-2056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2056.patch
>
>
> Update the usage of fetchdt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes

2011-06-09 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1295:
-

Attachment: HDFS-1295_for_ymerge_v2.patch

Turns out HDFS-1295 is dependent on HDFS-900.  Merged HDFS-900 to yahoo-merge, 
but now need a slightly modified port of HDFS-1295.  Attached.

> Improve namenode restart times by short-circuiting the first block reports 
> from datanodes
> -
>
> Key: HDFS-1295
> URL: https://issues.apache.org/jira/browse/HDFS-1295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: dhruba borthakur
>Assignee: Matt Foley
> Fix For: 0.23.0
>
> Attachments: HDFS-1295_delta_for_trunk.patch, 
> HDFS-1295_for_ymerge.patch, HDFS-1295_for_ymerge_v2.patch, 
> IBR_shortcut_v2a.patch, IBR_shortcut_v3atrunk.patch, 
> IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v4atrunk.patch, IBR_shortcut_v6atrunk.patch, 
> IBR_shortcut_v7atrunk.patch, shortCircuitBlockReport_1.txt
>
>
> The namenode restart is dominated by the performance of processing block 
> reports. On a 2000 node cluster with 90 million blocks,  block report 
> processing takes 30 to 40 minutes. The namenode "diffs" the contents of the 
> incoming block report with the contents of the blocks map, and then applies 
> these diffs to the blocksMap, but in reality there is no need to compute the 
> "diff" because this is the first block report from the datanode.
> This code change improves block report processing time by 300%.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2056) Update fetchdt usage

2011-06-09 Thread Tanping Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046955#comment-13046955
 ] 

Tanping Wang commented on HDFS-2056:


# The two new Findbugs warning is not related to this patch.  This is a simple 
usage update change.
# not test included as this is a usage update change.
# for the same reason, core tests failure is not related to this change.


> Update fetchdt usage
> 
>
> Key: HDFS-2056
> URL: https://issues.apache.org/jira/browse/HDFS-2056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2056.patch
>
>
> Update the usage of fetchdt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time

2011-06-09 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-2057.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

I committed the patch to 204, 205 and branch-0.20-security. Thank you Bharath.

> Wait time to terminate the threads causing unit tests to take longer time
> -
>
> Key: HDFS-2057
> URL: https://issues.apache.org/jira/browse/HDFS-2057
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.204.0, 0.20.205.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.205.0
>
> Attachments: HDFS-2057-1.patch
>
>
> As a part of datanode process hang, this part of code was introduced in 
> 0.20.204 to clean up all the waiting threads.
> -  try {
> -  readPool.awaitTermination(10, TimeUnit.SECONDS);
> -  } catch (InterruptedException e) {
> -   LOG.info("Exception occured in doStop:" + e.getMessage());
> -  }
> -  readPool.shutdownNow();
> This was clearly meant for production, but all the unit tests uses 
> minidfscluster and minimrcluster for shutdown which waits on this part of the 
> code. Due to this, we saw increase in unit test run times. So removing this 
> code. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2030:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I committed the patch. Thank you Bharath.

> Fix the usability of namenode upgrade command
> -
>
> Key: HDFS-2030
> URL: https://issues.apache.org/jira/browse/HDFS-2030
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch
>
>
> Fixing the Namenode upgrade option along the same line as Namenode format 
> option. 
> If clusterid is not given then clusterid will be automatically generated for 
> the upgrade but if clusterid is given then it will be honored.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time

2011-06-09 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046935#comment-13046935
 ] 

Suresh Srinivas commented on HDFS-2057:
---

This is reverting back to the previous code. +1 for the patch.

> Wait time to terminate the threads causing unit tests to take longer time
> -
>
> Key: HDFS-2057
> URL: https://issues.apache.org/jira/browse/HDFS-2057
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.204.0, 0.20.205.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.205.0
>
> Attachments: HDFS-2057-1.patch
>
>
> As a part of datanode process hang, this part of code was introduced in 
> 0.20.204 to clean up all the waiting threads.
> -  try {
> -  readPool.awaitTermination(10, TimeUnit.SECONDS);
> -  } catch (InterruptedException e) {
> -   LOG.info("Exception occured in doStop:" + e.getMessage());
> -  }
> -  readPool.shutdownNow();
> This was clearly meant for production, but all the unit tests uses 
> minidfscluster and minimrcluster for shutdown which waits on this part of the 
> code. Due to this, we saw increase in unit test run times. So removing this 
> code. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time

2011-06-09 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2057:
-

Attachment: HDFS-2057-1.patch

Attaching the patch.

> Wait time to terminate the threads causing unit tests to take longer time
> -
>
> Key: HDFS-2057
> URL: https://issues.apache.org/jira/browse/HDFS-2057
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.204.0, 0.20.205.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.205.0
>
> Attachments: HDFS-2057-1.patch
>
>
> As a part of datanode process hang, this part of code was introduced in 
> 0.20.204 to clean up all the waiting threads.
> -  try {
> -  readPool.awaitTermination(10, TimeUnit.SECONDS);
> -  } catch (InterruptedException e) {
> -   LOG.info("Exception occured in doStop:" + e.getMessage());
> -  }
> -  readPool.shutdownNow();
> This was clearly meant for production, but all the unit tests uses 
> minidfscluster and minimrcluster for shutdown which waits on this part of the 
> code. Due to this, we saw increase in unit test run times. So removing this 
> code. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time

2011-06-09 Thread Bharath Mundlapudi (JIRA)
Wait time to terminate the threads causing unit tests to take longer time
-

 Key: HDFS-2057
 URL: https://issues.apache.org/jira/browse/HDFS-2057
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0, 0.20.205.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.205.0


As a part of datanode process hang, this part of code was introduced in 
0.20.204 to clean up all the waiting threads.

-  try {
-  readPool.awaitTermination(10, TimeUnit.SECONDS);
-  } catch (InterruptedException e) {
-   LOG.info("Exception occured in doStop:" + e.getMessage());
-  }
-  readPool.shutdownNow();

This was clearly meant for production, but all the unit tests uses 
minidfscluster and minimrcluster for shutdown which waits on this part of the 
code. Due to this, we saw increase in unit test run times. So removing this 
code. 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046929#comment-13046929
 ] 

Suresh Srinivas commented on HDFS-2030:
---

Findbugs warning and TestHDFSCLI is unrelated to this patch.

> Fix the usability of namenode upgrade command
> -
>
> Key: HDFS-2030
> URL: https://issues.apache.org/jira/browse/HDFS-2030
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch
>
>
> Fixing the Namenode upgrade option along the same line as Namenode format 
> option. 
> If clusterid is not given then clusterid will be automatically generated for 
> the upgrade but if clusterid is given then it will be honored.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2041) Some mtimes and atimes are lost when edit logs are replayed

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046931#comment-13046931
 ] 

Hadoop QA commented on HDFS-2041:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481995/hdfs-2041.txt
  against trunk revision 1134124.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/758//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/758//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/758//console

This message is automatically generated.

> Some mtimes and atimes are lost when edit logs are replayed
> ---
>
> Key: HDFS-2041
> URL: https://issues.apache.org/jira/browse/HDFS-2041
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-2041.txt, hdfs-2041.txt
>
>
> The refactoring in HDFS-2003 allowed findbugs to expose two potential bugs:
> - the atime field logged with OP_MKDIR is unused
> - the timestamp field logged with OP_CONCAT_DELETE is unused
> The concat issue is definitely real. The atime for MKDIR might always be 
> identical to mtime in that case, in which case it could be ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2056) Update fetchdt usage

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046925#comment-13046925
 ] 

Hadoop QA commented on HDFS-2056:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481988/HDFS-2056.patch
  against trunk revision 1134031.

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/757//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/757//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/757//console

This message is automatically generated.

> Update fetchdt usage
> 
>
> Key: HDFS-2056
> URL: https://issues.apache.org/jira/browse/HDFS-2056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2056.patch
>
>
> Update the usage of fetchdt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046910#comment-13046910
 ] 

Todd Lipcon commented on HDFS-2054:
---

hrm... that's a pain. I guess our options are (a) parsing exception messages, 
or (b) passing the Socket object itself to BlockSender such that it can 
determine whether it's still open. Any other good ideas?

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046907#comment-13046907
 ] 

Kihwal Lee commented on HDFS-2054:
--

I tried SocketOutputStream.isOpen() in BlockSender.sendChunk(), but it seems 
even after EPIPE, isOpen() is not guaranteed to return false. 

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic

2011-06-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2003:
--

Affects Version/s: 0.23.0
Fix Version/s: 0.23.0

Updating fix versions, this was done in trunk as well.

> Separate FSEditLog reading logic from editLog memory state building logic
> -
>
> Key: HDFS-2003
> URL: https://issues.apache.org/jira/browse/HDFS-2003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: Edit log branch (HDFS-1073), 0.23.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073), 0.23.0
>
> Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt
>
>
> Currently FSEditLogLoader has code for reading from an InputStream 
> interleaved with code which updates the FSNameSystem and FSDirectory. This 
> makes it difficult to read an edit log without having a whole load of other 
> object initialised, which is problematic if you want to do things like count 
> how many transactions are in a file etc. 
> This patch separates the reading of the stream and the building of the memory 
> state. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2041) Some mtimes and atimes are lost when edit logs are replayed

2011-06-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2041:
--

Status: Patch Available  (was: Open)

> Some mtimes and atimes are lost when edit logs are replayed
> ---
>
> Key: HDFS-2041
> URL: https://issues.apache.org/jira/browse/HDFS-2041
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-2041.txt, hdfs-2041.txt
>
>
> The refactoring in HDFS-2003 allowed findbugs to expose two potential bugs:
> - the atime field logged with OP_MKDIR is unused
> - the timestamp field logged with OP_CONCAT_DELETE is unused
> The concat issue is definitely real. The atime for MKDIR might always be 
> identical to mtime in that case, in which case it could be ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes

2011-06-09 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046903#comment-13046903
 ] 

Matt Foley commented on HDFS-1295:
--

Response to test-patch:
-1 core tests: TestHDFSCL failure is unrelated.
-1 tests included: This is simply a completion of the previously approved patch.

Committed HDFS-1295_delta_for_trunk.patch to trunk.

> Improve namenode restart times by short-circuiting the first block reports 
> from datanodes
> -
>
> Key: HDFS-1295
> URL: https://issues.apache.org/jira/browse/HDFS-1295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: dhruba borthakur
>Assignee: Matt Foley
> Fix For: 0.23.0
>
> Attachments: HDFS-1295_delta_for_trunk.patch, 
> HDFS-1295_for_ymerge.patch, IBR_shortcut_v2a.patch, 
> IBR_shortcut_v3atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v6atrunk.patch, IBR_shortcut_v7atrunk.patch, 
> shortCircuitBlockReport_1.txt
>
>
> The namenode restart is dominated by the performance of processing block 
> reports. On a 2000 node cluster with 90 million blocks,  block report 
> processing takes 30 to 40 minutes. The namenode "diffs" the contents of the 
> incoming block report with the contents of the blocks map, and then applies 
> these diffs to the blocksMap, but in reality there is no need to compute the 
> "diff" because this is the first block report from the datanode.
> This code change improves block report processing time by 300%.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2041) Some mtimes and atimes are lost when edit logs are replayed

2011-06-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2041:
--

Attachment: hdfs-2041.txt

Rebased on trunk

> Some mtimes and atimes are lost when edit logs are replayed
> ---
>
> Key: HDFS-2041
> URL: https://issues.apache.org/jira/browse/HDFS-2041
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-2041.txt, hdfs-2041.txt
>
>
> The refactoring in HDFS-2003 allowed findbugs to expose two potential bugs:
> - the atime field logged with OP_MKDIR is unused
> - the timestamp field logged with OP_CONCAT_DELETE is unused
> The concat issue is definitely real. The atime for MKDIR might always be 
> identical to mtime in that case, in which case it could be ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046896#comment-13046896
 ] 

Hadoop QA commented on HDFS-2030:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481981/HDFS-2030-3.patch
  against trunk revision 1134031.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/756//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/756//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/756//console

This message is automatically generated.

> Fix the usability of namenode upgrade command
> -
>
> Key: HDFS-2030
> URL: https://issues.apache.org/jira/browse/HDFS-2030
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch
>
>
> Fixing the Namenode upgrade option along the same line as Namenode format 
> option. 
> If clusterid is not given then clusterid will be automatically generated for 
> the upgrade but if clusterid is given then it will be honored.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-2027) 1073: Image inspector should return finalized logs before unfinalized logs

2011-06-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2027.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to branch, thanks Eli!

> 1073: Image inspector should return finalized logs before unfinalized logs
> --
>
> Key: HDFS-2027
> URL: https://issues.apache.org/jira/browse/HDFS-2027
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2027.txt
>
>
> Found this small bug while testing multiple NNs under failure conditions on 
> the 1073 branch. When the 2NN calls getEditLogManifest(), it expects a list 
> of finalized logs. In the case that one of the edit log directories had 
> failed and recovered, there would be some txid for which there was an 
> edit_N_inprogress and an edits_N-M (finalized). The edit log manifest needs 
> to see the finalized one when it exists.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-2048) 1073: Improve upgrade tests from 0.22

2011-06-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2048.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to branch, thanks for review, Eli!

> 1073: Improve upgrade tests from 0.22
> -
>
> Key: HDFS-2048
> URL: https://issues.apache.org/jira/browse/HDFS-2048
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2048.txt
>
>
> TestDFSUpgradeFromImage currently tests an upgrade from 0.22, but doesn't 
> test that the image checksum field is properly respected during the upgrade.
> This JIRA is to improve those tests by also testing the case where the image 
> has been corrupted.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-2047) Improve TestNamespace and TestEditLog in 1073 branch

2011-06-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2047.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to branch, thanks for review, Eli.

> Improve TestNamespace and TestEditLog in 1073 branch
> 
>
> Key: HDFS-2047
> URL: https://issues.apache.org/jira/browse/HDFS-2047
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2047.txt
>
>
> These tests currently have some test cases that don't make sense after 
> HDFS-1073. This JIRA is to update these tests to do the equivalent things on 
> 1073.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2056) Update fetchdt usage

2011-06-09 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-2056:
---

Hadoop Flags: [Reviewed]
  Status: Patch Available  (was: Open)

> Update fetchdt usage
> 
>
> Key: HDFS-2056
> URL: https://issues.apache.org/jira/browse/HDFS-2056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2056.patch
>
>
> Update the usage of fetchdt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2056) Update fetchdt usage

2011-06-09 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046873#comment-13046873
 ] 

Jitendra Nath Pandey commented on HDFS-2056:


+1.

> Update fetchdt usage
> 
>
> Key: HDFS-2056
> URL: https://issues.apache.org/jira/browse/HDFS-2056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2056.patch
>
>
> Update the usage of fetchdt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2056) Update fetchdt usage

2011-06-09 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-2056:
---

Attachment: HDFS-2056.patch

> Update fetchdt usage
> 
>
> Key: HDFS-2056
> URL: https://issues.apache.org/jira/browse/HDFS-2056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2056.patch
>
>
> Update the usage of fetchdt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2056) Update fetchdt usage

2011-06-09 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-2056:
---

  Component/s: tools
   documentation
Affects Version/s: 0.23.0
Fix Version/s: 0.23.0

> Update fetchdt usage
> 
>
> Key: HDFS-2056
> URL: https://issues.apache.org/jira/browse/HDFS-2056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2056.patch
>
>
> Update the usage of fetchdt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2056) Update fetchdt usage

2011-06-09 Thread Tanping Wang (JIRA)
Update fetchdt usage


 Key: HDFS-2056
 URL: https://issues.apache.org/jira/browse/HDFS-2056
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tanping Wang
Assignee: Tanping Wang
Priority: Minor


Update the usage of fetchdt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2027) 1073: Image inspector should return finalized logs before unfinalized logs

2011-06-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046848#comment-13046848
 ] 

Eli Collins commented on HDFS-2027:
---

+1 lgtm

> 1073: Image inspector should return finalized logs before unfinalized logs
> --
>
> Key: HDFS-2027
> URL: https://issues.apache.org/jira/browse/HDFS-2027
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2027.txt
>
>
> Found this small bug while testing multiple NNs under failure conditions on 
> the 1073 branch. When the 2NN calls getEditLogManifest(), it expects a list 
> of finalized logs. In the case that one of the edit log directories had 
> failed and recovered, there would be some txid for which there was an 
> edit_N_inprogress and an edits_N-M (finalized). The edit log manifest needs 
> to see the finalized one when it exists.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2030:
--

Hadoop Flags: [Reviewed]
  Status: Patch Available  (was: Open)

> Fix the usability of namenode upgrade command
> -
>
> Key: HDFS-2030
> URL: https://issues.apache.org/jira/browse/HDFS-2030
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch
>
>
> Fixing the Namenode upgrade option along the same line as Namenode format 
> option. 
> If clusterid is not given then clusterid will be automatically generated for 
> the upgrade but if clusterid is given then it will be honored.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046826#comment-13046826
 ] 

Suresh Srinivas commented on HDFS-2030:
---

+1 for the change

> Fix the usability of namenode upgrade command
> -
>
> Key: HDFS-2030
> URL: https://issues.apache.org/jira/browse/HDFS-2030
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch
>
>
> Fixing the Namenode upgrade option along the same line as Namenode format 
> option. 
> If clusterid is not given then clusterid will be automatically generated for 
> the upgrade but if clusterid is given then it will be honored.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2048) 1073: Improve upgrade tests from 0.22

2011-06-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046819#comment-13046819
 ] 

Eli Collins commented on HDFS-2048:
---

+1  looks great

> 1073: Improve upgrade tests from 0.22
> -
>
> Key: HDFS-2048
> URL: https://issues.apache.org/jira/browse/HDFS-2048
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2048.txt
>
>
> TestDFSUpgradeFromImage currently tests an upgrade from 0.22, but doesn't 
> test that the image checksum field is properly respected during the upgrade.
> This JIRA is to improve those tests by also testing the case where the image 
> has been corrupted.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2030:
-

Attachment: HDFS-2030-3.patch

Done some more minor cleanup related to comments and adding more description to 
test class.

Please find the attached patch.

> Fix the usability of namenode upgrade command
> -
>
> Key: HDFS-2030
> URL: https://issues.apache.org/jira/browse/HDFS-2030
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch
>
>
> Fixing the Namenode upgrade option along the same line as Namenode format 
> option. 
> If clusterid is not given then clusterid will be automatically generated for 
> the upgrade but if clusterid is given then it will be honored.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2047) Improve TestNamespace and TestEditLog in 1073 branch

2011-06-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046811#comment-13046811
 ] 

Eli Collins commented on HDFS-2047:
---

+1 looks great.

The TODO in the WRITE_STORAGE_ONE case in TestSaveNamespace is out of scope for 
1073, file a new jira (Save namespace should succeed as long as there's at 
least one valid storage dirdir)? Seems like we could/should fix that in 
parallel.

> Improve TestNamespace and TestEditLog in 1073 branch
> 
>
> Key: HDFS-2047
> URL: https://issues.apache.org/jira/browse/HDFS-2047
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2047.txt
>
>
> These tests currently have some test cases that don't make sense after 
> HDFS-1073. This JIRA is to update these tests to do the equivalent things on 
> 1073.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046808#comment-13046808
 ] 

Kihwal Lee commented on HDFS-2054:
--

>I'm not in favor of parsing the exception text for behavior-altering 
>things. But for deciding whether to log at debug vs warn level, it 
>seems OK to me.

This sounds reasonable.

>Another thought is to check something like socket.isInputShutdown() 
>or socket.isConnected()? Maybe we can assume that any case where we 
>get an IOE but the socket was then found to be disconnected is OK. 
>If we had a local IOE with the transferto, the socket would still be up.

This is even better, IMO.

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046804#comment-13046804
 ] 

Todd Lipcon commented on HDFS-2054:
---

Yea, it sucks that Java doesn't give us a way to get at the underlying errno in 
these cases. For the IOEs thrown by the hadoop-native code in common, we 
actually have an Errno enum that makes life easy.

I'm not in favor of parsing the exception text for behavior-altering things. 
But for deciding whether to log at debug vs warn level, it seems OK to me.

Another thought is to check something like socket.isInputShutdown() or 
socket.isConnected()? Maybe we can assume that any case where we get an IOE but 
the socket was then found to be disconnected is OK. If we had a local IOE with 
the transferto, the socket would still be up.

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2030:
-

Attachment: HDFS-2030-2.patch

Attached the patch.

> Fix the usability of namenode upgrade command
> -
>
> Key: HDFS-2030
> URL: https://issues.apache.org/jira/browse/HDFS-2030
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch
>
>
> Fixing the Namenode upgrade option along the same line as Namenode format 
> option. 
> If clusterid is not given then clusterid will be automatically generated for 
> the upgrade but if clusterid is given then it will be honored.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046801#comment-13046801
 ] 

Kihwal Lee commented on HDFS-2054:
--

Last time I tried what you said with EAGAIN in transferTo() in attempt to avoid 
doing epoll() evey time even before sending anything. Some folks were not 
thrilled about parsing the text.  If it can be done in portable/i14n friendly 
way and people do not object the idea itself...

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046800#comment-13046800
 ] 

Bharath Mundlapudi commented on HDFS-2030:
--

Thanks for the review, Suresh.
My comments inline.
 
1.1 Missing banner - done.
1.2 This method is package protected, this unit test just test this function 
instead of using time consuming MiniDFSCluster.
1.3 Removed the null and empty checks.
1.4 BoolpoolID is autogenerated. Now i have modified the tests to not mock 
this. 
1.5 Added assertEquals where necessary
1.6 Made multiple tests

2.1 Since the setBlockPoolID() and setClusterID() are in NNStorage, i moved 
this function to this class now solves this problem.
2.2 renamed the function
2.3 comments moved outside the function and moved the if condition inside the 
method.

Attaching the patch with these changes.

> Fix the usability of namenode upgrade command
> -
>
> Key: HDFS-2030
> URL: https://issues.apache.org/jira/browse/HDFS-2030
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2030-1.patch
>
>
> Fixing the Namenode upgrade option along the same line as Namenode format 
> option. 
> If clusterid is not given then clusterid will be automatically generated for 
> the upgrade but if clusterid is given then it will be honored.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046795#comment-13046795
 ] 

Todd Lipcon commented on HDFS-2054:
---

Maybe we can check the exception type and message, and only log warning for 
unexpected ones? EG "Connection reset by peer" and "Broken pipe" are expected 
exceptions, but anything else should be logged at WARN level.

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2055) Add hflush support to libhdfs

2011-06-09 Thread Travis Crawford (JIRA)
Add hflush support to libhdfs
-

 Key: HDFS-2055
 URL: https://issues.apache.org/jira/browse/HDFS-2055
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: libhdfs
Reporter: Travis Crawford


libhdfs would be improved by adding support for hflush.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes

2011-06-09 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046790#comment-13046790
 ] 

Suresh Srinivas commented on HDFS-1295:
---

+1 for the yahoo-merge patch also

> Improve namenode restart times by short-circuiting the first block reports 
> from datanodes
> -
>
> Key: HDFS-1295
> URL: https://issues.apache.org/jira/browse/HDFS-1295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: dhruba borthakur
>Assignee: Matt Foley
> Fix For: 0.23.0
>
> Attachments: HDFS-1295_delta_for_trunk.patch, 
> HDFS-1295_for_ymerge.patch, IBR_shortcut_v2a.patch, 
> IBR_shortcut_v3atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v6atrunk.patch, IBR_shortcut_v7atrunk.patch, 
> shortCircuitBlockReport_1.txt
>
>
> The namenode restart is dominated by the performance of processing block 
> reports. On a 2000 node cluster with 90 million blocks,  block report 
> processing takes 30 to 40 minutes. The namenode "diffs" the contents of the 
> incoming block report with the contents of the blocks map, and then applies 
> these diffs to the blocksMap, but in reality there is no need to compute the 
> "diff" because this is the first block report from the datanode.
> This code change improves block report processing time by 300%.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes

2011-06-09 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046787#comment-13046787
 ] 

Suresh Srinivas commented on HDFS-1295:
---

+1 for the trunk patch

> Improve namenode restart times by short-circuiting the first block reports 
> from datanodes
> -
>
> Key: HDFS-1295
> URL: https://issues.apache.org/jira/browse/HDFS-1295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: dhruba borthakur
>Assignee: Matt Foley
> Fix For: 0.23.0
>
> Attachments: HDFS-1295_delta_for_trunk.patch, 
> HDFS-1295_for_ymerge.patch, IBR_shortcut_v2a.patch, 
> IBR_shortcut_v3atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v6atrunk.patch, IBR_shortcut_v7atrunk.patch, 
> shortCircuitBlockReport_1.txt
>
>
> The namenode restart is dominated by the performance of processing block 
> reports. On a 2000 node cluster with 90 million blocks,  block report 
> processing takes 30 to 40 minutes. The namenode "diffs" the contents of the 
> incoming block report with the contents of the blocks map, and then applies 
> these diffs to the blocksMap, but in reality there is no need to compute the 
> "diff" because this is the first block report from the datanode.
> This code change improves block report processing time by 300%.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)

2011-06-09 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins reassigned HDFS-2053:
-

Assignee: Michael Noll

Hey Michael - thank you for the excellent report!

In summary, the condition used to warn in FSDirectory#computeContentSummary has 
a bug, it compares the cached value for the directory not to a computed value 
for that directory but to a computed value that includes the directory and it's 
siblings. 

The bug results in a spurious warning, it doesn't impact eg the correctness of 
quotas. Given this I think two things are reasonable:
# Remove the warning (which removes the bug)
# Compute the correct summary for just that directory (your patch)

The latter sounds good to me. Allocating a 4 long array for each level in the 
directory hierarchy isn't bad and this method isn't on a hot path.

Nit, I'd change array allocation to the following since we assume summary has 
len 4 and should be faster.

{noformat}
assert 4 == summary.length;
long[] subtreeSummary = new long[]{0,0,0,0}
{noformat}

Wrt testing how about right after space is calculated adding the following:

{noformat}
assert -1 == node.getDsQuota() || space == subtreeSummary[3];
{noformat}

Asserts are enabled by default when the tests are run, if TestQuota doesn't 
trigger this assert then add a test similar to what you did manullay which will 
trigger it.

Also, please generate a patch against trunk (HDFS-2053_v2.txt doesn't apply for 
me).

Thanks!

> NameNode detects "Inconsistent diskspace" for directories with quota-enabled 
> subdirectories (introduced by HDFS-1377)
> -
>
> Key: HDFS-2053
> URL: https://issues.apache.org/jira/browse/HDFS-2053
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0
> Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch 
> applied.
> My impression is that the same issue exists also in the other branches where 
> the HDFS-1377 patch has been applied to (see description).
>Reporter: Michael Noll
>Assignee: Michael Noll
>Priority: Minor
> Fix For: 0.20.3, 0.20.204.0, 0.20.205.0
>
> Attachments: HDFS-2053_v1.txt, HDFS-2053_v2.txt
>
>
> *How to reproduce*
> {code}
> # create test directories
> $ hadoop fs -mkdir /hdfs-1377/A
> $ hadoop fs -mkdir /hdfs-1377/B
> $ hadoop fs -mkdir /hdfs-1377/C
> # ...add some test data (few kB or MB) to all three dirs...
> # set space quota for subdir C only
> $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C
> # the following two commands _on the parent dir_ trigger the warning
> $ hadoop fs -dus /hdfs-1377
> $ hadoop fs -count -q /hdfs-1377
> {code}
> Warning message in the namenode logs:
> {code}
> 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355
> {code}
> Note that the commands are run on the _parent directory_ but the warning is 
> shown for the _subdirectory_ with space quota.
> *Background*
> The bug was introduced by the HDFS-1377 patch, which is currently committed 
> to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, 
> branch-0.20-security-205 and release-0.20.3-rc2.  In the patch, 
> {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was 
> updated to trigger the warning above if the cached and computed diskspace 
> values are not the same for a directory with quota.
> The warning is written by {{computecontentSummary(long[] summary)}} in 
> {{INodeDirectory}}. In the method an inode's children are recursively walked 
> through while the {{summary}} parameter is passed and updated along the way.
> {code}
>   /** {@inheritDoc} */
>   long[] computeContentSummary(long[] summary) {
> if (children != null) {
>   for (INode child : children) {
> child.computeContentSummary(summary);
>   }
> }
> {code}
> The condition that triggers the warning message compares the current node's 
> cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding 
> field in {{summary}}.
> {code}
>   if (-1 != node.getDsQuota() && space != summary[3]) {
> NameNode.LOG.warn("Inconsistent diskspace for directory "
>   +getLocalName()+". Cached: "+space+" Computed: "+summary[3]);
> {code}
> However {{summary}} may already include diskspace information from other 
> inodes at this point (i.e. from different subtrees than the subtree of the 
> node for which the warning message is shown; in our example for the tree at 
> {{/hdfs-1377}}, {{summary}} can already contain information from 
> {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode 
> {{/hdfs-1377/C}

[jira] [Updated] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes

2011-06-09 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1295:
-

Attachment: HDFS-1295_for_ymerge.patch

Attaching patch ported to yahoo-merge branch.

Turning off "Patch Available" so Hudson doesn't try to run test-patch on 
non-trunk patch.

> Improve namenode restart times by short-circuiting the first block reports 
> from datanodes
> -
>
> Key: HDFS-1295
> URL: https://issues.apache.org/jira/browse/HDFS-1295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: dhruba borthakur
>Assignee: Matt Foley
> Fix For: 0.23.0
>
> Attachments: HDFS-1295_delta_for_trunk.patch, 
> HDFS-1295_for_ymerge.patch, IBR_shortcut_v2a.patch, 
> IBR_shortcut_v3atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v6atrunk.patch, IBR_shortcut_v7atrunk.patch, 
> shortCircuitBlockReport_1.txt
>
>
> The namenode restart is dominated by the performance of processing block 
> reports. On a 2000 node cluster with 90 million blocks,  block report 
> processing takes 30 to 40 minutes. The namenode "diffs" the contents of the 
> incoming block report with the contents of the blocks map, and then applies 
> these diffs to the blocksMap, but in reality there is no need to compute the 
> "diff" because this is the first block report from the datanode.
> This code change improves block report processing time by 300%.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes

2011-06-09 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1295:
-

Status: Open  (was: Patch Available)

> Improve namenode restart times by short-circuiting the first block reports 
> from datanodes
> -
>
> Key: HDFS-1295
> URL: https://issues.apache.org/jira/browse/HDFS-1295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: dhruba borthakur
>Assignee: Matt Foley
> Fix For: 0.23.0
>
> Attachments: HDFS-1295_delta_for_trunk.patch, 
> HDFS-1295_for_ymerge.patch, IBR_shortcut_v2a.patch, 
> IBR_shortcut_v3atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v6atrunk.patch, IBR_shortcut_v7atrunk.patch, 
> shortCircuitBlockReport_1.txt
>
>
> The namenode restart is dominated by the performance of processing block 
> reports. On a 2000 node cluster with 90 million blocks,  block report 
> processing takes 30 to 40 minutes. The namenode "diffs" the contents of the 
> incoming block report with the contents of the blocks map, and then applies 
> these diffs to the blocksMap, but in reality there is no need to compute the 
> "diff" because this is the first block report from the datanode.
> This code change improves block report processing time by 300%.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-06-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046767#comment-13046767
 ] 

Aaron T. Myers commented on HDFS-1973:
--

Hi Hari,

bq. Can you please elaborate a little bit on your area of interest with 
ZOOKEEPER-1080?

As noted in Sanjay's design doc, one proposal for detecting NN failure would be 
to use an external ZK service. The HDFS proposal doesn't go into great detail 
on this, but it suggests using ZK with a heartbeat mechanism to see if the NN 
is still alive. I personally like the ZK recipe better (i.e. using ephemeral + 
sequence nodes).

Another possible use for ZK in the implementation of NN HA would be to use ZK 
as the source of truth for clients to determine the active NN. This would seem 
to flow naturally from the part of the ZK recipe which says "Applications may 
consider creating a separate to znode to acknowledge that the leader has 
executed the leader procedure." If NN HA were to utilize an implementation of 
the ZK leader election recipe, then perhaps this "leader-procedure-complete 
znode" could store the IP or hostname of the active NN which clients could use.

I haven't read the design doc posted on ZOOKEEPER-1080 yet. I'll go ahead and 
do that and post my comments there.

I should also mention that we have not settled upon what strategy we'll take to 
do NN failure detection or client failover. As noted in Sanjay's design doc, 
we're also strongly considering using virtual IPs for client failover.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046765#comment-13046765
 ] 

Kihwal Lee commented on HDFS-2054:
--

To minimum, it will get rid of the annoying stack trace. 

transferTo() is not exactly making it easy to deal with different exceptions 
differently. I believe things like EAGAIN was fixed now, but to deal with 
others you have to parse the error itself, which is rather gross. Ideally we 
want to deal with EAGAIN, EPIPE, etc. separately and if something else happens 
print an error message. 

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046760#comment-13046760
 ] 

stack commented on HDFS-2054:
-

@Kihwal Do we think this enough to address this issue?  I see loads of it 
running hbase loadings on 0.22.

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046756#comment-13046756
 ] 

stack commented on HDFS-941:


Yeah, my 0.22 version fails against trunk (trunk already has guava, etc.)

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046755#comment-13046755
 ] 

stack commented on HDFS-941:


So, that would leave 48 beers that I need to buy (And Nigel probably wants two) 
-- I can get a keg?

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046754#comment-13046754
 ] 

Kihwal Lee commented on HDFS-2054:
--

It may reveal interesting errors in the future, so the log level is being 
lowered to warn. 

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-2054:
-

Attachment: HDFS-2054.patch

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046749#comment-13046749
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481963/941.22.txt
  against trunk revision 1134031.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 21 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/755//console

This message is automatically generated.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046744#comment-13046744
 ] 

Eli Collins commented on HDFS-941:
--

Make that two beers (52/48?). I reviewed an earlier version of this patch but 
if Nigel is game I think it's suitable for 22 as well.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-2054:
-

  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-941:
---

Attachment: 941.22.txt

Forgot --no-prefix.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046739#comment-13046739
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481962/941.22.txt
  against trunk revision 1134031.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 21 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/754//console

This message is automatically generated.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, 
> HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, 
> HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046738#comment-13046738
 ] 

stack commented on HDFS-941:


Todd, I'll buy you a beer to go 51/49 in favor of 0.22 commit.  If Nigel wants 
me to a make a case, I could do it here or in another issue?


> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, 
> HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, 
> HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-941:
---

Attachment: 941.22.txt

Here is my backport of Todds final patch.  Main differences are adding in guava 
and removal of TestDataXceiver (util works differently in TRUNK).

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, 
> HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, 
> HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-941:
-

Hadoop Flags: [Reviewed]

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic

2011-06-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2003:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to trunk, thanks Ivan! I will try to merge this into the 1073 branch 
this afternoon or evening.

> Separate FSEditLog reading logic from editLog memory state building logic
> -
>
> Key: HDFS-2003
> URL: https://issues.apache.org/jira/browse/HDFS-2003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt
>
>
> Currently FSEditLogLoader has code for reading from an InputStream 
> interleaved with code which updates the FSNameSystem and FSDirectory. This 
> makes it difficult to read an edit log without having a whole load of other 
> object initialised, which is problematic if you want to do things like count 
> how many transactions are in a file etc. 
> This patch separates the reading of the stream and the building of the memory 
> state. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046727#comment-13046727
 ] 

Todd Lipcon commented on HDFS-2003:
---

+1, good stuff! I'll commit momentarily

> Separate FSEditLog reading logic from editLog memory state building logic
> -
>
> Key: HDFS-2003
> URL: https://issues.apache.org/jira/browse/HDFS-2003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt
>
>
> Currently FSEditLogLoader has code for reading from an InputStream 
> interleaved with code which updates the FSNameSystem and FSDirectory. This 
> makes it difficult to read an edit log without having a whole load of other 
> object initialised, which is problematic if you want to do things like count 
> how many transactions are in a file etc. 
> This patch separates the reading of the stream and the building of the memory 
> state. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-2054:
-

Description: The addition of ERROR was part of HDFS-1527. In environments 
where clients tear down FSInputStream/connection before reaching the end of 
stream, this error message often pops up. Since these are not really errors and 
especially not the fault of data node, the message should be toned down at 
least.   (was: The addition of ERROR was part of HDFS-1527. In environments 
where clients tear down FSInputStream/connection before reaching the end of 
stream, this error message often pops up. Since these are not really errors and 
especially not the fault of data node, the message should be toned down at 
least. Assigning to the author of HDFS-1527.)
   Assignee: Kihwal Lee  (was: Patrick Kling)

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Patrick Kling (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046725#comment-13046725
 ] 

Patrick Kling commented on HDFS-2054:
-

If I remember correctly, the bug fixed by HDFS-1527 was causing the affected 
transfers to fail silently. That's why I added this message. If it is polluting 
the log file, I have no objection to downgrading this to a warning.

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Patrick Kling
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 
> Assigning to the author of HDFS-1527.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046724#comment-13046724
 ] 

Todd Lipcon commented on HDFS-941:
--

Also, big thanks to: bc for authoring the majority of the patch and test cases, 
Sam Rash for reviews, and Stack and Kihwal for both code review and cluster 
testing. Great team effort spanning 4 companies!

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046722#comment-13046722
 ] 

Todd Lipcon commented on HDFS-941:
--

Committed to trunk.

I'm 50/50 on whether this should go into the 0.22 branch as well. Like Stack 
said, it's a nice carrot to help convince HBase users to try out 0.22. But, 
it's purely an optimization and on the riskier side as far as these things go. 
I guess I'll ping Nigel?

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-236) Random read benchmark for DFS

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046718#comment-13046718
 ] 

stack commented on HDFS-236:


Reread Raghu's comments above.  Its (still) great.


> Random read benchmark for DFS
> -
>
> Key: HDFS-236
> URL: https://issues.apache.org/jira/browse/HDFS-236
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Attachments: HDFS-236.patch, RndRead-TestDFSIO.patch
>
>
> We should have at least one  random read benchmark that can be run with rest 
> of Hadoop benchmarks regularly.
> Please provide benchmark  ideas or requirements.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046716#comment-13046716
 ] 

Kihwal Lee commented on HDFS-941:
-

They were pure readers and didn't write/report anything until the end.  I just 
filed HDFS-2054 for the error message. If you find the other JIRA that was 
already filed, please dupe one to the other.

+1 for commit.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-09 Thread Kihwal Lee (JIRA)
BlockSender.sendChunk() prints ERROR for connection closures encountered  
during transferToFully()
--

 Key: HDFS-2054
 URL: https://issues.apache.org/jira/browse/HDFS-2054
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.22.0, 0.23.0
Reporter: Kihwal Lee
Assignee: Patrick Kling


The addition of ERROR was part of HDFS-1527. In environments where clients tear 
down FSInputStream/connection before reaching the end of stream, this error 
message often pops up. Since these are not really errors and especially not the 
fault of data node, the message should be toned down at least. Assigning to the 
author of HDFS-1527.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046706#comment-13046706
 ] 

Todd Lipcon commented on HDFS-941:
--

Regarding duplicate connections: also keep in mind that the caching only 
applies at the read side. So, assuming there's some output as well, there will 
be a socket for each of those streams.

I agree we should fix the "sendChunks" error messages separately. I think JD 
might have filed a JIRA about this a few weeks ago. I'll see if I can dig it up.

Kihwal: are you +1 on commit now as well?

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046700#comment-13046700
 ] 

stack commented on HDFS-941:


+1 on commit for latest version of patch.

I've been running over the last few hours.  I no longer see "Client 
/10.4.9.34did not send a valid status code after reading" (fix the space on 
commit) nor do I see the "Got error for OP_READ_BLOCK" exceptions".   I have 
the BlockSender.sendChunks exceptions but they are something else (that we need 
to fix).

Nice test you have over there Kihwal!

My test was a 5 node cluster running hbase on a 451 patched 0.22.  The loading 
was random reads running in MR and then another random-read test being done via 
a bunch of clients.  Cache was disabled so went to FS for all data.  I also had 
random writing going on concurrently.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046661#comment-13046661
 ] 

Kihwal Lee commented on HDFS-941:
-

OK, I see it's from BlockSender.java:407. It really shouldn't say ERROR since 
clients can close connections any time, but I agree that this needs to be 
addressed in a separate work.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046650#comment-13046650
 ] 

stack commented on HDFS-941:


@Kihwal I see lots of those sendChunks exceptions but don't think related.  
Testing latest addition to patch...

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1409) The "register" method of the BackupNode class should be "UnsupportedActionException("register")"

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046649#comment-13046649
 ] 

Hadoop QA commented on HDFS-1409:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12455111/HDFS-1409.patch
  against trunk revision 1133476.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.server.namenode.TestBackupNode

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/753//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/753//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/753//console

This message is automatically generated.

> The "register" method of the BackupNode class should be 
> "UnsupportedActionException("register")"
> 
>
> Key: HDFS-1409
> URL: https://issues.apache.org/jira/browse/HDFS-1409
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Ching-Shen Chen
>Priority: Trivial
> Fix For: 0.21.1
>
> Attachments: HDFS-1409.patch, HDFS-1409.patch
>
>
> The register method of the BackupNode class should be 
> "UnsupportedActionException("register")" rather than  
> "UnsupportedActionException("journal")".

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046643#comment-13046643
 ] 

Kihwal Lee commented on HDFS-941:
-

I am retesting with Todd's patch and I don't see the messages anymore. Instead, 
I see more of "BlockSender.sendChunks() exception: java.io.IOException: Broken 
pipe" from DNs. 

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2002) Incorrect computation of needed blocks in getTurnOffTip()

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046626#comment-13046626
 ] 

Hadoop QA commented on HDFS-2002:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481935/hdfs-2002.patch
  against trunk revision 1133476.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.server.namenode.TestBackupNode
  org.apache.hadoop.hdfs.server.namenode.TestSafeMode

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/752//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/752//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/752//console

This message is automatically generated.

> Incorrect computation of needed blocks in getTurnOffTip()
> -
>
> Key: HDFS-2002
> URL: https://issues.apache.org/jira/browse/HDFS-2002
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Matthias Eckert
>  Labels: newbie
> Fix For: 0.22.0
>
> Attachments: hdfs-2002.patch
>
>
> {{SafeModeInfo.getTurnOffTip()}} under-reports the number of blocks needed to 
> reach the safemode threshold.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1409) The "register" method of the BackupNode class should be "UnsupportedActionException("register")"

2011-06-09 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1409:
--

Status: Patch Available  (was: Open)

> The "register" method of the BackupNode class should be 
> "UnsupportedActionException("register")"
> 
>
> Key: HDFS-1409
> URL: https://issues.apache.org/jira/browse/HDFS-1409
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Ching-Shen Chen
>Priority: Trivial
> Fix For: 0.21.1
>
> Attachments: HDFS-1409.patch, HDFS-1409.patch
>
>
> The register method of the BackupNode class should be 
> "UnsupportedActionException("register")" rather than  
> "UnsupportedActionException("journal")".

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2002) Incorrect computation of needed blocks in getTurnOffTip()

2011-06-09 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-2002:
--

Assignee: Matthias Eckert

> Incorrect computation of needed blocks in getTurnOffTip()
> -
>
> Key: HDFS-2002
> URL: https://issues.apache.org/jira/browse/HDFS-2002
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Matthias Eckert
>  Labels: newbie
> Fix For: 0.22.0
>
> Attachments: hdfs-2002.patch
>
>
> {{SafeModeInfo.getTurnOffTip()}} under-reports the number of blocks needed to 
> reach the safemode threshold.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2002) Incorrect computation of needed blocks in getTurnOffTip()

2011-06-09 Thread Matthias Eckert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Eckert updated HDFS-2002:
--

Status: Patch Available  (was: Open)

> Incorrect computation of needed blocks in getTurnOffTip()
> -
>
> Key: HDFS-2002
> URL: https://issues.apache.org/jira/browse/HDFS-2002
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>  Labels: newbie
> Fix For: 0.22.0
>
> Attachments: hdfs-2002.patch
>
>
> {{SafeModeInfo.getTurnOffTip()}} under-reports the number of blocks needed to 
> reach the safemode threshold.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046583#comment-13046583
 ] 

Kihwal Lee commented on HDFS-941:
-

Regarding duplicate connections, it makes sense because the inputstream cache 
is per file and it is quite possible that the clients read blocks belonging to 
two files that are on the same DN within the window of 3 reads.  

I will look at the one happening during task initialization. May be they just 
stop reading in the middle of stream by design.  Since one message will show up 
for every new map task, how about changing the message to DEBUG after we are 
done with testing?

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2002) Incorrect computation of needed blocks in getTurnOffTip()

2011-06-09 Thread Matthias Eckert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Eckert updated HDFS-2002:
--

Attachment: hdfs-2002.patch

Log a warning if the threshold is larger than 1.
Correct the number of remaining nodes and blocks.

> Incorrect computation of needed blocks in getTurnOffTip()
> -
>
> Key: HDFS-2002
> URL: https://issues.apache.org/jira/browse/HDFS-2002
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>  Labels: newbie
> Fix For: 0.22.0
>
> Attachments: hdfs-2002.patch
>
>
> {{SafeModeInfo.getTurnOffTip()}} under-reports the number of blocks needed to 
> reach the safemode threshold.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-236) Random read benchmark for DFS

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046572#comment-13046572
 ] 

Kihwal Lee commented on HDFS-236:
-

* Some test.io.randomread.* seem to deserve a spot in command line args. 
* The buffer size can be used as the read size in random reads. I see no reason 
to separate the two in the random read mode.
* The default behavior is, one random reader operates on just one file out of N 
files. Since it already has ability to limit the number of files that each 
reader can access, it might be better to make it work on all N files by default.


> Random read benchmark for DFS
> -
>
> Key: HDFS-236
> URL: https://issues.apache.org/jira/browse/HDFS-236
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Attachments: HDFS-236.patch, RndRead-TestDFSIO.patch
>
>
> We should have at least one  random read benchmark that can be run with rest 
> of Hadoop benchmarks regularly.
> Please provide benchmark  ideas or requirements.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-1410) The doCheckpoint() method should be invoked every hour

2011-06-09 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved HDFS-1410.
---

Resolution: Duplicate

This was fixed in HDFS-1572.  Resolving.

> The doCheckpoint() method should be invoked every hour
> --
>
> Key: HDFS-1410
> URL: https://issues.apache.org/jira/browse/HDFS-1410
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Ching-Shen Chen
> Fix For: 0.21.1
>
> Attachments: HDFS-1410.patch, HDFS-1410.patch
>
>
> The doCheckpoint() method should be invoked every hour rather than five 
> minutes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046526#comment-13046526
 ] 

Kihwal Lee commented on HDFS-941:
-

Good catch and fix!  I took a close look the open connections each reader has 
and sometimes saw more than one connections to a same DN. I will see if that is 
fixed with the Todd's fix. Otherwise I will look further to determine if it is 
an issue.

The test I did was primarily for exercising the socket cache itself.  To make 
it more interesting, the socket cache size was lowered to 3 and dfs.replication 
to 1.  I used the random read test (work in progress) in HDFS-236 on a cluster 
with 8 data nodes.  200 X 170MB files were created.  200 readers (25 on each 
DN) read 200 files randomly 64K at a time, jumping among files, for about 6 
hours last night. Each reader caches DFSInputStream to all 200 files during its 
lifetime. Checked the client/server logs afterward.

** I saw 25 of the "did not send a valid status code after reading. Will close 
connection" warning at around the task initialization (readers are map tasks) 
on each data node. They all look local, so they are likely accessing the job 
conf/jar files that are replicated and available on all eight data nodes, 
unlike regular data files. Or accessing local DN for some other reasons during 
this time period. Need to check whether this needs to be fixed. 
 
** While running, there were 3 ESTABLISHED connections per process and some 
number of sockets in TIME_WAIT all the time. It means socket cache is not 
leaking anything, clients are not denied of new connections and eviction is 
working.

** The only thing I think a bit odd is the symptom I mentioned above: Duplicate 
connections in the socket cache. I will try to reproduce with Todd's latest fix.


> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046499#comment-13046499
 ] 

Hadoop QA commented on HDFS-2053:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481918/HDFS-2053_v2.txt
  against trunk revision 1133476.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/751//console

This message is automatically generated.

> NameNode detects "Inconsistent diskspace" for directories with quota-enabled 
> subdirectories (introduced by HDFS-1377)
> -
>
> Key: HDFS-2053
> URL: https://issues.apache.org/jira/browse/HDFS-2053
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0
> Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch 
> applied.
> My impression is that the same issue exists also in the other branches where 
> the HDFS-1377 patch has been applied to (see description).
>Reporter: Michael Noll
>Priority: Minor
> Fix For: 0.20.3, 0.20.204.0, 0.20.205.0
>
> Attachments: HDFS-2053_v1.txt, HDFS-2053_v2.txt
>
>
> *How to reproduce*
> {code}
> # create test directories
> $ hadoop fs -mkdir /hdfs-1377/A
> $ hadoop fs -mkdir /hdfs-1377/B
> $ hadoop fs -mkdir /hdfs-1377/C
> # ...add some test data (few kB or MB) to all three dirs...
> # set space quota for subdir C only
> $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C
> # the following two commands _on the parent dir_ trigger the warning
> $ hadoop fs -dus /hdfs-1377
> $ hadoop fs -count -q /hdfs-1377
> {code}
> Warning message in the namenode logs:
> {code}
> 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355
> {code}
> Note that the commands are run on the _parent directory_ but the warning is 
> shown for the _subdirectory_ with space quota.
> *Background*
> The bug was introduced by the HDFS-1377 patch, which is currently committed 
> to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, 
> branch-0.20-security-205 and release-0.20.3-rc2.  In the patch, 
> {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was 
> updated to trigger the warning above if the cached and computed diskspace 
> values are not the same for a directory with quota.
> The warning is written by {{computecontentSummary(long[] summary)}} in 
> {{INodeDirectory}}. In the method an inode's children are recursively walked 
> through while the {{summary}} parameter is passed and updated along the way.
> {code}
>   /** {@inheritDoc} */
>   long[] computeContentSummary(long[] summary) {
> if (children != null) {
>   for (INode child : children) {
> child.computeContentSummary(summary);
>   }
> }
> {code}
> The condition that triggers the warning message compares the current node's 
> cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding 
> field in {{summary}}.
> {code}
>   if (-1 != node.getDsQuota() && space != summary[3]) {
> NameNode.LOG.warn("Inconsistent diskspace for directory "
>   +getLocalName()+". Cached: "+space+" Computed: "+summary[3]);
> {code}
> However {{summary}} may already include diskspace information from other 
> inodes at this point (i.e. from different subtrees than the subtree of the 
> node for which the warning message is shown; in our example for the tree at 
> {{/hdfs-1377}}, {{summary}} can already contain information from 
> {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode 
> {{/hdfs-1377/C}}).  Hence the cached value for {{C}} can incorrectly be 
> different from the computed value.
> *How to fix*
> The supplied patch creates a fresh summary array for the subtree of the 
> current node.  The walk through the children passes and updates this 
> {{subtreeSummary}} array, and the condition is checked against 
> {{subtreeSummary}} instead of the original {{summary}}.  The original 
> {{summary}} is updated with the values of {{subtreeSummary}} before it 
> returns.
> *Unit Tests*
> I have run "ant test" on my patched build without any errors*.  However the 
> existing unit tests did not catch this issue for the original HDFS-1377 
>

[jira] [Updated] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)

2011-06-09 Thread Michael Noll (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Noll updated HDFS-2053:
---

Attachment: HDFS-2053_v2.txt

New patch version, no properly using 'git diff --no-prefix' to generate it. Doh!

> NameNode detects "Inconsistent diskspace" for directories with quota-enabled 
> subdirectories (introduced by HDFS-1377)
> -
>
> Key: HDFS-2053
> URL: https://issues.apache.org/jira/browse/HDFS-2053
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0
> Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch 
> applied.
> My impression is that the same issue exists also in the other branches where 
> the HDFS-1377 patch has been applied to (see description).
>Reporter: Michael Noll
>Priority: Minor
> Fix For: 0.20.3, 0.20.204.0, 0.20.205.0
>
> Attachments: HDFS-2053_v1.txt, HDFS-2053_v2.txt
>
>
> *How to reproduce*
> {code}
> # create test directories
> $ hadoop fs -mkdir /hdfs-1377/A
> $ hadoop fs -mkdir /hdfs-1377/B
> $ hadoop fs -mkdir /hdfs-1377/C
> # ...add some test data (few kB or MB) to all three dirs...
> # set space quota for subdir C only
> $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C
> # the following two commands _on the parent dir_ trigger the warning
> $ hadoop fs -dus /hdfs-1377
> $ hadoop fs -count -q /hdfs-1377
> {code}
> Warning message in the namenode logs:
> {code}
> 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355
> {code}
> Note that the commands are run on the _parent directory_ but the warning is 
> shown for the _subdirectory_ with space quota.
> *Background*
> The bug was introduced by the HDFS-1377 patch, which is currently committed 
> to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, 
> branch-0.20-security-205 and release-0.20.3-rc2.  In the patch, 
> {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was 
> updated to trigger the warning above if the cached and computed diskspace 
> values are not the same for a directory with quota.
> The warning is written by {{computecontentSummary(long[] summary)}} in 
> {{INodeDirectory}}. In the method an inode's children are recursively walked 
> through while the {{summary}} parameter is passed and updated along the way.
> {code}
>   /** {@inheritDoc} */
>   long[] computeContentSummary(long[] summary) {
> if (children != null) {
>   for (INode child : children) {
> child.computeContentSummary(summary);
>   }
> }
> {code}
> The condition that triggers the warning message compares the current node's 
> cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding 
> field in {{summary}}.
> {code}
>   if (-1 != node.getDsQuota() && space != summary[3]) {
> NameNode.LOG.warn("Inconsistent diskspace for directory "
>   +getLocalName()+". Cached: "+space+" Computed: "+summary[3]);
> {code}
> However {{summary}} may already include diskspace information from other 
> inodes at this point (i.e. from different subtrees than the subtree of the 
> node for which the warning message is shown; in our example for the tree at 
> {{/hdfs-1377}}, {{summary}} can already contain information from 
> {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode 
> {{/hdfs-1377/C}}).  Hence the cached value for {{C}} can incorrectly be 
> different from the computed value.
> *How to fix*
> The supplied patch creates a fresh summary array for the subtree of the 
> current node.  The walk through the children passes and updates this 
> {{subtreeSummary}} array, and the condition is checked against 
> {{subtreeSummary}} instead of the original {{summary}}.  The original 
> {{summary}} is updated with the values of {{subtreeSummary}} before it 
> returns.
> *Unit Tests*
> I have run "ant test" on my patched build without any errors*.  However the 
> existing unit tests did not catch this issue for the original HDFS-1377 
> patch, so this might not mean anything. ;-)
> That said I am unsure what the most appropriate way to unit test this issue 
> would be.  A straight-forward approach would be to automate the steps in the 
> _How to reproduce section_ above and check whether the NN logs an incorrect 
> warning message.  But I'm not sure how this check could be implemented.  Feel 
> free to provide some pointers if you have some ideas.
> *Note about Fix Version/s*
> The patch _should_ apply to all branches where the HDFS-1377 patch has 
> committed to.  In my environment, the build was Hadoop 0.20.203.0 release 
> with a (trivial) backport of 

[jira] [Commented] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046490#comment-13046490
 ] 

Hadoop QA commented on HDFS-2053:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481915/HDFS-2053_v1.txt
  against trunk revision 1133476.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/750//console

This message is automatically generated.

> NameNode detects "Inconsistent diskspace" for directories with quota-enabled 
> subdirectories (introduced by HDFS-1377)
> -
>
> Key: HDFS-2053
> URL: https://issues.apache.org/jira/browse/HDFS-2053
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0
> Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch 
> applied.
> My impression is that the same issue exists also in the other branches where 
> the HDFS-1377 patch has been applied to (see description).
>Reporter: Michael Noll
>Priority: Minor
> Fix For: 0.20.3, 0.20.204.0, 0.20.205.0
>
> Attachments: HDFS-2053_v1.txt
>
>
> *How to reproduce*
> {code}
> # create test directories
> $ hadoop fs -mkdir /hdfs-1377/A
> $ hadoop fs -mkdir /hdfs-1377/B
> $ hadoop fs -mkdir /hdfs-1377/C
> # ...add some test data (few kB or MB) to all three dirs...
> # set space quota for subdir C only
> $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C
> # the following two commands _on the parent dir_ trigger the warning
> $ hadoop fs -dus /hdfs-1377
> $ hadoop fs -count -q /hdfs-1377
> {code}
> Warning message in the namenode logs:
> {code}
> 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355
> {code}
> Note that the commands are run on the _parent directory_ but the warning is 
> shown for the _subdirectory_ with space quota.
> *Background*
> The bug was introduced by the HDFS-1377 patch, which is currently committed 
> to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, 
> branch-0.20-security-205 and release-0.20.3-rc2.  In the patch, 
> {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was 
> updated to trigger the warning above if the cached and computed diskspace 
> values are not the same for a directory with quota.
> The warning is written by {{computecontentSummary(long[] summary)}} in 
> {{INodeDirectory}}. In the method an inode's children are recursively walked 
> through while the {{summary}} parameter is passed and updated along the way.
> {code}
>   /** {@inheritDoc} */
>   long[] computeContentSummary(long[] summary) {
> if (children != null) {
>   for (INode child : children) {
> child.computeContentSummary(summary);
>   }
> }
> {code}
> The condition that triggers the warning message compares the current node's 
> cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding 
> field in {{summary}}.
> {code}
>   if (-1 != node.getDsQuota() && space != summary[3]) {
> NameNode.LOG.warn("Inconsistent diskspace for directory "
>   +getLocalName()+". Cached: "+space+" Computed: "+summary[3]);
> {code}
> However {{summary}} may already include diskspace information from other 
> inodes at this point (i.e. from different subtrees than the subtree of the 
> node for which the warning message is shown; in our example for the tree at 
> {{/hdfs-1377}}, {{summary}} can already contain information from 
> {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode 
> {{/hdfs-1377/C}}).  Hence the cached value for {{C}} can incorrectly be 
> different from the computed value.
> *How to fix*
> The supplied patch creates a fresh summary array for the subtree of the 
> current node.  The walk through the children passes and updates this 
> {{subtreeSummary}} array, and the condition is checked against 
> {{subtreeSummary}} instead of the original {{summary}}.  The original 
> {{summary}} is updated with the values of {{subtreeSummary}} before it 
> returns.
> *Unit Tests*
> I have run "ant test" on my patched build without any errors*.  However the 
> existing unit tests did not catch this issue for the original HDFS-1377 
> patch, so this mi

[jira] [Updated] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)

2011-06-09 Thread Michael Noll (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Noll updated HDFS-2053:
---

Fix Version/s: (was: 0.20.203.0)
   0.20.205.0
   0.20.204.0
Affects Version/s: (was: 0.20.203.0)
   0.20.205.0
   0.20.204.0
   Status: Patch Available  (was: Open)

Again, I am not sure how to properly identify the correct names of the 
versions.  For instance, the patch successfully applies to 
branch-0.20-security-204 but I am not sure whether this translates to version 
"0.20.204.0" in the dropdown list.

> NameNode detects "Inconsistent diskspace" for directories with quota-enabled 
> subdirectories (introduced by HDFS-1377)
> -
>
> Key: HDFS-2053
> URL: https://issues.apache.org/jira/browse/HDFS-2053
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0
> Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch 
> applied.
> My impression is that the same issue exists also in the other branches where 
> the HDFS-1377 patch has been applied to (see description).
>Reporter: Michael Noll
>Priority: Minor
> Fix For: 0.20.3, 0.20.204.0, 0.20.205.0
>
> Attachments: HDFS-2053_v1.txt
>
>
> *How to reproduce*
> {code}
> # create test directories
> $ hadoop fs -mkdir /hdfs-1377/A
> $ hadoop fs -mkdir /hdfs-1377/B
> $ hadoop fs -mkdir /hdfs-1377/C
> # ...add some test data (few kB or MB) to all three dirs...
> # set space quota for subdir C only
> $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C
> # the following two commands _on the parent dir_ trigger the warning
> $ hadoop fs -dus /hdfs-1377
> $ hadoop fs -count -q /hdfs-1377
> {code}
> Warning message in the namenode logs:
> {code}
> 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355
> {code}
> Note that the commands are run on the _parent directory_ but the warning is 
> shown for the _subdirectory_ with space quota.
> *Background*
> The bug was introduced by the HDFS-1377 patch, which is currently committed 
> to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, 
> branch-0.20-security-205 and release-0.20.3-rc2.  In the patch, 
> {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was 
> updated to trigger the warning above if the cached and computed diskspace 
> values are not the same for a directory with quota.
> The warning is written by {{computecontentSummary(long[] summary)}} in 
> {{INodeDirectory}}. In the method an inode's children are recursively walked 
> through while the {{summary}} parameter is passed and updated along the way.
> {code}
>   /** {@inheritDoc} */
>   long[] computeContentSummary(long[] summary) {
> if (children != null) {
>   for (INode child : children) {
> child.computeContentSummary(summary);
>   }
> }
> {code}
> The condition that triggers the warning message compares the current node's 
> cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding 
> field in {{summary}}.
> {code}
>   if (-1 != node.getDsQuota() && space != summary[3]) {
> NameNode.LOG.warn("Inconsistent diskspace for directory "
>   +getLocalName()+". Cached: "+space+" Computed: "+summary[3]);
> {code}
> However {{summary}} may already include diskspace information from other 
> inodes at this point (i.e. from different subtrees than the subtree of the 
> node for which the warning message is shown; in our example for the tree at 
> {{/hdfs-1377}}, {{summary}} can already contain information from 
> {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode 
> {{/hdfs-1377/C}}).  Hence the cached value for {{C}} can incorrectly be 
> different from the computed value.
> *How to fix*
> The supplied patch creates a fresh summary array for the subtree of the 
> current node.  The walk through the children passes and updates this 
> {{subtreeSummary}} array, and the condition is checked against 
> {{subtreeSummary}} instead of the original {{summary}}.  The original 
> {{summary}} is updated with the values of {{subtreeSummary}} before it 
> returns.
> *Unit Tests*
> I have run "ant test" on my patched build without any errors*.  However the 
> existing unit tests did not catch this issue for the original HDFS-1377 
> patch, so this might not mean anything. ;-)
> That said I am unsure what the most appropriate way to unit test this issue 
> would be.  A straight-forward approach would be to automate the steps in the 
> _How to reproduce section_ above 

[jira] [Updated] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)

2011-06-09 Thread Michael Noll (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Noll updated HDFS-2053:
---

Attachment: HDFS-2053_v1.txt

Patch version 1 for HDFS-2053.

The patch should apply to all branches to which the original HDFS-1377 patch 
has been applied to. See the ticket description for more details regarding "Fix 
Version/s".

> NameNode detects "Inconsistent diskspace" for directories with quota-enabled 
> subdirectories (introduced by HDFS-1377)
> -
>
> Key: HDFS-2053
> URL: https://issues.apache.org/jira/browse/HDFS-2053
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.3, 0.20.203.0
> Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch 
> applied.
> My impression is that the same issue exists also in the other branches where 
> the HDFS-1377 patch has been applied to (see description).
>Reporter: Michael Noll
>Priority: Minor
> Fix For: 0.20.3, 0.20.203.0
>
> Attachments: HDFS-2053_v1.txt
>
>
> *How to reproduce*
> {code}
> # create test directories
> $ hadoop fs -mkdir /hdfs-1377/A
> $ hadoop fs -mkdir /hdfs-1377/B
> $ hadoop fs -mkdir /hdfs-1377/C
> # ...add some test data (few kB or MB) to all three dirs...
> # set space quota for subdir C only
> $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C
> # the following two commands _on the parent dir_ trigger the warning
> $ hadoop fs -dus /hdfs-1377
> $ hadoop fs -count -q /hdfs-1377
> {code}
> Warning message in the namenode logs:
> {code}
> 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355
> {code}
> Note that the commands are run on the _parent directory_ but the warning is 
> shown for the _subdirectory_ with space quota.
> *Background*
> The bug was introduced by the HDFS-1377 patch, which is currently committed 
> to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, 
> branch-0.20-security-205 and release-0.20.3-rc2.  In the patch, 
> {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was 
> updated to trigger the warning above if the cached and computed diskspace 
> values are not the same for a directory with quota.
> The warning is written by {{computecontentSummary(long[] summary)}} in 
> {{INodeDirectory}}. In the method an inode's children are recursively walked 
> through while the {{summary}} parameter is passed and updated along the way.
> {code}
>   /** {@inheritDoc} */
>   long[] computeContentSummary(long[] summary) {
> if (children != null) {
>   for (INode child : children) {
> child.computeContentSummary(summary);
>   }
> }
> {code}
> The condition that triggers the warning message compares the current node's 
> cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding 
> field in {{summary}}.
> {code}
>   if (-1 != node.getDsQuota() && space != summary[3]) {
> NameNode.LOG.warn("Inconsistent diskspace for directory "
>   +getLocalName()+". Cached: "+space+" Computed: "+summary[3]);
> {code}
> However {{summary}} may already include diskspace information from other 
> inodes at this point (i.e. from different subtrees than the subtree of the 
> node for which the warning message is shown; in our example for the tree at 
> {{/hdfs-1377}}, {{summary}} can already contain information from 
> {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode 
> {{/hdfs-1377/C}}).  Hence the cached value for {{C}} can incorrectly be 
> different from the computed value.
> *How to fix*
> The supplied patch creates a fresh summary array for the subtree of the 
> current node.  The walk through the children passes and updates this 
> {{subtreeSummary}} array, and the condition is checked against 
> {{subtreeSummary}} instead of the original {{summary}}.  The original 
> {{summary}} is updated with the values of {{subtreeSummary}} before it 
> returns.
> *Unit Tests*
> I have run "ant test" on my patched build without any errors*.  However the 
> existing unit tests did not catch this issue for the original HDFS-1377 
> patch, so this might not mean anything. ;-)
> That said I am unsure what the most appropriate way to unit test this issue 
> would be.  A straight-forward approach would be to automate the steps in the 
> _How to reproduce section_ above and check whether the NN logs an incorrect 
> warning message.  But I'm not sure how this check could be implemented.  Feel 
> free to provide some pointers if you have some ideas.
> *Note about Fix Version/s*
> The patch _should_ apply to all branches where the HDFS-1377 patch has 
> committed to.  In my envir

[jira] [Created] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)

2011-06-09 Thread Michael Noll (JIRA)
NameNode detects "Inconsistent diskspace" for directories with quota-enabled 
subdirectories (introduced by HDFS-1377)
-

 Key: HDFS-2053
 URL: https://issues.apache.org/jira/browse/HDFS-2053
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.3, 0.20.203.0
 Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch 
applied.

My impression is that the same issue exists also in the other branches where 
the HDFS-1377 patch has been applied to (see description).
Reporter: Michael Noll
Priority: Minor
 Fix For: 0.20.3, 0.20.203.0


*How to reproduce*

{code}
# create test directories
$ hadoop fs -mkdir /hdfs-1377/A
$ hadoop fs -mkdir /hdfs-1377/B
$ hadoop fs -mkdir /hdfs-1377/C

# ...add some test data (few kB or MB) to all three dirs...

# set space quota for subdir C only
$ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C

# the following two commands _on the parent dir_ trigger the warning
$ hadoop fs -dus /hdfs-1377
$ hadoop fs -count -q /hdfs-1377
{code}

Warning message in the namenode logs:

{code}
2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355
{code}

Note that the commands are run on the _parent directory_ but the warning is 
shown for the _subdirectory_ with space quota.

*Background*
The bug was introduced by the HDFS-1377 patch, which is currently committed to 
at least branch-0.20, branch-0.20-security, branch-0.20-security-204, 
branch-0.20-security-205 and release-0.20.3-rc2.  In the patch, 
{{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was 
updated to trigger the warning above if the cached and computed diskspace 
values are not the same for a directory with quota.

The warning is written by {{computecontentSummary(long[] summary)}} in 
{{INodeDirectory}}. In the method an inode's children are recursively walked 
through while the {{summary}} parameter is passed and updated along the way.

{code}
  /** {@inheritDoc} */
  long[] computeContentSummary(long[] summary) {
if (children != null) {
  for (INode child : children) {
child.computeContentSummary(summary);
  }
}
{code}

The condition that triggers the warning message compares the current node's 
cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding 
field in {{summary}}.

{code}
  if (-1 != node.getDsQuota() && space != summary[3]) {
NameNode.LOG.warn("Inconsistent diskspace for directory "
  +getLocalName()+". Cached: "+space+" Computed: "+summary[3]);
{code}

However {{summary}} may already include diskspace information from other inodes 
at this point (i.e. from different subtrees than the subtree of the node for 
which the warning message is shown; in our example for the tree at 
{{/hdfs-1377}}, {{summary}} can already contain information from 
{{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode 
{{/hdfs-1377/C}}).  Hence the cached value for {{C}} can incorrectly be 
different from the computed value.

*How to fix*

The supplied patch creates a fresh summary array for the subtree of the current 
node.  The walk through the children passes and updates this {{subtreeSummary}} 
array, and the condition is checked against {{subtreeSummary}} instead of the 
original {{summary}}.  The original {{summary}} is updated with the values of 
{{subtreeSummary}} before it returns.

*Unit Tests*

I have run "ant test" on my patched build without any errors*.  However the 
existing unit tests did not catch this issue for the original HDFS-1377 patch, 
so this might not mean anything. ;-)

That said I am unsure what the most appropriate way to unit test this issue 
would be.  A straight-forward approach would be to automate the steps in the 
_How to reproduce section_ above and check whether the NN logs an incorrect 
warning message.  But I'm not sure how this check could be implemented.  Feel 
free to provide some pointers if you have some ideas.

*Note about Fix Version/s*

The patch _should_ apply to all branches where the HDFS-1377 patch has 
committed to.  In my environment, the build was Hadoop 0.20.203.0 release with 
a (trivial) backport of HDFS-1377 (0.20.203.0 release does not ship with the 
HDFS-1377 fix).  I could apply the patch successfully to 
{{branch-0.20-security}}, {{branch-0.20-security-204}} and 
{{release-0.20.3-rc2}}, for instance.  Since I'm a bit confused regarding the 
upcoming 0.20.x release versions (0.20.x vs. 0.20.20x.y) I have been so bold 
and added 0.20.203.0 to the list of affected versions even though it is 
actually only affected when HDFS-1377 is applied to it...

Best,
Michael


*Well, I get one error for {{TestRumenJobTraces}} but first this see

[jira] [Commented] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046429#comment-13046429
 ] 

Hadoop QA commented on HDFS-2003:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481897/HDFS-2003.diff
  against trunk revision 1133476.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/749//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/749//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/749//console

This message is automatically generated.

> Separate FSEditLog reading logic from editLog memory state building logic
> -
>
> Key: HDFS-2003
> URL: https://issues.apache.org/jira/browse/HDFS-2003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt
>
>
> Currently FSEditLogLoader has code for reading from an InputStream 
> interleaved with code which updates the FSNameSystem and FSDirectory. This 
> makes it difficult to read an edit log without having a whole load of other 
> object initialised, which is problematic if you want to do things like count 
> how many transactions are in a file etc. 
> This patch separates the reading of the stream and the building of the memory 
> state. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046422#comment-13046422
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481896/hdfs-941.txt
  against trunk revision 1133476.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/748//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/748//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/748//console

This message is automatically generated.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046397#comment-13046397
 ] 

Hadoop QA commented on HDFS-988:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481892/hdfs-988-6.patch
  against trunk revision 1133476.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 33 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/747//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/747//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/747//console

This message is automatically generated.

> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, 
> hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, 
> hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, 
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic

2011-06-09 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-2003:
-

Attachment: HDFS-2003.diff

Addressed the two things from Todd's previous comment.

> Separate FSEditLog reading logic from editLog memory state building logic
> -
>
> Key: HDFS-2003
> URL: https://issues.apache.org/jira/browse/HDFS-2003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt
>
>
> Currently FSEditLogLoader has code for reading from an InputStream 
> interleaved with code which updates the FSNameSystem and FSDirectory. This 
> makes it difficult to read an edit log without having a whole load of other 
> object initialised, which is problematic if you want to do things like count 
> how many transactions are in a file etc. 
> This patch separates the reading of the stream and the building of the memory 
> state. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046389#comment-13046389
 ] 

Hadoop QA commented on HDFS-988:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481892/hdfs-988-6.patch
  against trunk revision 1133476.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 33 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/746//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/746//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/746//console

This message is automatically generated.

> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, 
> hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, 
> hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, 
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >