[jira] [Commented] (HDFS-2305) Running multiple 2NNs can result in corrupt file system

2011-09-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095810#comment-13095810
 ] 

Hadoop QA commented on HDFS-2305:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12492676/hdfs-2305.0.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1186//console

This message is automatically generated.

 Running multiple 2NNs can result in corrupt file system
 ---

 Key: HDFS-2305
 URL: https://issues.apache.org/jira/browse/HDFS-2305
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: hdfs-2305-test.patch, hdfs-2305.0.patch


 Here's the scenario:
 * You run the NN and 2NN (2NN A) on the same machine.
 * You don't have the address of the 2NN configured, so it's defaulting to 
 127.0.0.1.
 * There's another 2NN (2NN B) running on a second machine.
 * When a 2NN is done checkpointing, it says hey NN, I have an updated 
 fsimage for you. You can download it from this URL, which includes my IP 
 address, which is x
 And here's the steps that occur to cause this issue:
 # Some edits happen.
 # 2NN A (on the NN machine) does a checkpoint. All is dandy.
 # Some more edits happen.
 # 2NN B (on a different machine) does a checkpoint. It tells the NN grab the 
 newly-merged fsimage file from 127.0.0.1
 # NN happily grabs the fsimage from 2NN A (the 2NN on the NN machine), which 
 is stale.
 # NN renames edits.new file to edits. At this point the in-memory FS state is 
 fine, but the on-disk state is missing edits.
 # The next time a 2NN (any 2NN) tries to do a checkpoint, it gets an 
 up-to-date edits file, with an outdated fsimage, and tries to apply those 
 edits to that fsimage.
 # Kaboom.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk

2011-09-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095812#comment-13095812
 ] 

Hadoop QA commented on HDFS-2299:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12492646/HDFS-2299.1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1187//console

This message is automatically generated.

 TestOfflineEditsViewer is failing on trunk
 --

 Key: HDFS-2299
 URL: https://issues.apache.org/jira/browse/HDFS-2299
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2299.1.patch, HDFS-2299.patch, HDFS-2299.patch


 The relevant bit of the error:
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec  
 FAILURE!
 testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
   Time elapsed: 0.038 sec   FAILURE!
 java.lang.AssertionError: Reference XML edits and parsed to XML should be same
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-09-02 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-1973:
-

Attachment: hdfs-1973.0.patch

Here's a preliminary patch (not intended for commit) to give people an idea how 
this will work.

The main things this is missing before it could reasonably be committed are:

# It currently doesn't handle clean-up of fail-over client resources at all. 
The way RPC resource cleanup currently works is by looking up the appropriate 
RPCEngine given a protocol class, and leaving it up to that class's 
InvocationHandler. This implicitly assumes that there is a one-to-one mapping 
from protocol class - invocation handler, which is no longer true. It's not 
obvious to me at the moment what's the best way to deal with this.
# Currently only one of the {{ClientProtocol}} methods is annotated with the 
@Idempotent annotation.
# It currently doesn't handle concurrent connections at all.

 HA: HDFS clients must handle namenode failover and switch over to the new 
 active namenode.
 --

 Key: HDFS-1973
 URL: https://issues.apache.org/jira/browse/HDFS-1973
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Suresh Srinivas
Assignee: Aaron T. Myers
 Attachments: hdfs-1973.0.patch


 During failover, a client must detect the current active namenode failure and 
 switch over to the new active namenode. The switch over might make use of IP 
 failover or some thing more elaborate such as zookeeper to discover the new 
 active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk

2011-09-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095944#comment-13095944
 ] 

Hudson commented on HDFS-2299:
--

Integrated in Hadoop-Hdfs-trunk #780 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/780/])
HDFS-2299. TestOfflineEditsViewer is failing on trunk. (Uma Maheswara Rao G 
via atm)

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1164192
Files : 
* /hadoop/common/trunk/dev-support/test-patch.properties
* /hadoop/common/trunk/hadoop-hdfs-project/dev-support/test-patch.properties
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml


 TestOfflineEditsViewer is failing on trunk
 --

 Key: HDFS-2299
 URL: https://issues.apache.org/jira/browse/HDFS-2299
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2299.1.patch, HDFS-2299.patch, HDFS-2299.patch


 The relevant bit of the error:
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec  
 FAILURE!
 testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
   Time elapsed: 0.038 sec   FAILURE!
 java.lang.AssertionError: Reference XML edits and parsed to XML should be same
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk

2011-09-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095956#comment-13095956
 ] 

Hudson commented on HDFS-2299:
--

Integrated in Hadoop-Mapreduce-trunk #804 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/804/])
HDFS-2299. TestOfflineEditsViewer is failing on trunk. (Uma Maheswara Rao G 
via atm)

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1164192
Files : 
* /hadoop/common/trunk/dev-support/test-patch.properties
* /hadoop/common/trunk/hadoop-hdfs-project/dev-support/test-patch.properties
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml


 TestOfflineEditsViewer is failing on trunk
 --

 Key: HDFS-2299
 URL: https://issues.apache.org/jira/browse/HDFS-2299
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2299.1.patch, HDFS-2299.patch, HDFS-2299.patch


 The relevant bit of the error:
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec  
 FAILURE!
 testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
   Time elapsed: 0.038 sec   FAILURE!
 java.lang.AssertionError: Reference XML edits and parsed to XML should be same
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2299) TestOfflineEditsViewer is failing on trunk

2011-09-02 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2299:
--

Status: Open  (was: Patch Available)

 TestOfflineEditsViewer is failing on trunk
 --

 Key: HDFS-2299
 URL: https://issues.apache.org/jira/browse/HDFS-2299
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, 
 HDFS-2299.patch


 The relevant bit of the error:
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec  
 FAILURE!
 testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
   Time elapsed: 0.038 sec   FAILURE!
 java.lang.AssertionError: Reference XML edits and parsed to XML should be same
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2299) TestOfflineEditsViewer is failing on trunk

2011-09-02 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2299:
--

Attachment: HDFS-2299.2.patch

 TestOfflineEditsViewer is failing on trunk
 --

 Key: HDFS-2299
 URL: https://issues.apache.org/jira/browse/HDFS-2299
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, 
 HDFS-2299.patch


 The relevant bit of the error:
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec  
 FAILURE!
 testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
   Time elapsed: 0.038 sec   FAILURE!
 java.lang.AssertionError: Reference XML edits and parsed to XML should be same
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk

2011-09-02 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096013#comment-13096013
 ] 

Uma Maheswara Rao G commented on HDFS-2299:
---

Hey Aaron, Yes, I ran the test-patch.It did not show any warnings.
It looks to me that we need not even add exclude tag in RAT configuration 
item.Because HDFS already has that excluded. Because of that it did not show 
any warnings if i add/don't add in common.adding exclude tag is no use in 
common.

In HDFS pom.xml:

{code}
 excludesrc/test/resources/data*/exclude
 excludesrc/test/resources/editStored*/exclude
 excludesrc/test/resources/empty-file/exclude
{code}

I just removed the Apache header from editStored.xml.

Test-Patch results:

 +1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit. The applied patch does not increase the total number of 
release audit warnings.


Thanks 
Uma

 TestOfflineEditsViewer is failing on trunk
 --

 Key: HDFS-2299
 URL: https://issues.apache.org/jira/browse/HDFS-2299
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, 
 HDFS-2299.patch


 The relevant bit of the error:
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec  
 FAILURE!
 testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
   Time elapsed: 0.038 sec   FAILURE!
 java.lang.AssertionError: Reference XML edits and parsed to XML should be same
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2012) Recurring failure of TestBalancer on branch-0.22

2011-09-02 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096026#comment-13096026
 ] 

Uma Maheswara Rao G commented on HDFS-2012:
---

Hi Konstantin/Aaron,
 It is passing in my local box. 
 Now also this test failing in your local box ?

Thanks
Uma

 Recurring failure of TestBalancer on branch-0.22
 

 Key: HDFS-2012
 URL: https://issues.apache.org/jira/browse/HDFS-2012
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer, test
Affects Versions: 0.22.0
Reporter: Aaron T. Myers
Priority: Blocker
 Fix For: 0.22.0


 This has been failing on Hudson for the last two builds and fails on my local 
 box as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2302) HDFS logs not being rotated

2011-09-02 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash resolved HDFS-2302.


Resolution: Invalid

In commit 8b0430307b662a1533686aeefa0760380b7c5182 the logs are being 
automated. Marking as invalid

 HDFS logs not being rotated
 ---

 Key: HDFS-2302
 URL: https://issues.apache.org/jira/browse/HDFS-2302
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ravi Prakash

 In commit c5edca2b15eca7c0bd568a0017f699ac91b8aebf, the logs for the 
 namenode, datanode and secondarynamenode are being written to .out files and 
 are not being rotated after one day. IMHO rotation of logs is important

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2302) HDFS logs not being rotated

2011-09-02 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096035#comment-13096035
 ] 

Ravi Prakash commented on HDFS-2302:


*rotated I meant


 HDFS logs not being rotated
 ---

 Key: HDFS-2302
 URL: https://issues.apache.org/jira/browse/HDFS-2302
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ravi Prakash

 In commit c5edca2b15eca7c0bd568a0017f699ac91b8aebf, the logs for the 
 namenode, datanode and secondarynamenode are being written to .out files and 
 are not being rotated after one day. IMHO rotation of logs is important

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1161) Make DN minimum valid volumes configurable

2011-09-02 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096099#comment-13096099
 ] 

Eli Collins commented on HDFS-1161:
---

That's reasonable. The list of volumes (local dirs) is explicitly listed so the 
config isn't portable even when specified as a percent, but it's one less 
config that isn't portable.

IIRC Koji's perspective was that an admin doesn't want to specify the count or 
percent of valid volumes, but that after a set number of failures the host 
should be considered faulty. Eg if it's lost two disks there's probably 
something wrong whether the host has 6 or 12 disks, ie assumes disk failures 
w/in a host are correlated.

Ideally I think we should collect data (eg an X core host can still function 
well with Y% disks) and not require users configure this at all - it would be 
enabled by default and the daemons would take themselves offline when they've 
determined they don't have sufficient resources.



 Make DN minimum valid volumes configurable
 --

 Key: HDFS-1161
 URL: https://issues.apache.org/jira/browse/HDFS-1161
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.21.0, 0.22.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.21.0

 Attachments: HDFS-1161-y20.patch, hdfs-1161-1.patch, 
 hdfs-1161-2.patch, hdfs-1161-3.patch, hdfs-1161-4.patch, hdfs-1161-5.patch, 
 hdfs-1161-6.patch


 The minimum number of non-faulty volumes to keep the DN active is hard-coded 
 to 1.  It would be useful to allow users to configure this value so the DN 
 can be taken offline when eg half of its disks fail, otherwise it doesn't get 
 reported until it's down to it's final disk and suffering degraded 
 performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2299) TestOfflineEditsViewer is failing on trunk

2011-09-02 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-2299.
--

Resolution: Fixed

Thanks a lot for reworking the patch, Uma. I've just committed the latest one.

 TestOfflineEditsViewer is failing on trunk
 --

 Key: HDFS-2299
 URL: https://issues.apache.org/jira/browse/HDFS-2299
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, 
 HDFS-2299.patch


 The relevant bit of the error:
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec  
 FAILURE!
 testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
   Time elapsed: 0.038 sec   FAILURE!
 java.lang.AssertionError: Reference XML edits and parsed to XML should be same
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk

2011-09-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096121#comment-13096121
 ] 

Hudson commented on HDFS-2299:
--

Integrated in Hadoop-Common-trunk-Commit #824 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/824/])
HDFS-2299. TestOfflineEditsViewer is failing on trunk. (Uma Maheswara Rao G 
via atm)

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1164624
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml


 TestOfflineEditsViewer is failing on trunk
 --

 Key: HDFS-2299
 URL: https://issues.apache.org/jira/browse/HDFS-2299
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, 
 HDFS-2299.patch


 The relevant bit of the error:
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec  
 FAILURE!
 testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
   Time elapsed: 0.038 sec   FAILURE!
 java.lang.AssertionError: Reference XML edits and parsed to XML should be same
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk

2011-09-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096122#comment-13096122
 ] 

Hudson commented on HDFS-2299:
--

Integrated in Hadoop-Hdfs-trunk-Commit #901 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/901/])
HDFS-2299. TestOfflineEditsViewer is failing on trunk. (Uma Maheswara Rao G 
via atm)

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1164624
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml


 TestOfflineEditsViewer is failing on trunk
 --

 Key: HDFS-2299
 URL: https://issues.apache.org/jira/browse/HDFS-2299
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, 
 HDFS-2299.patch


 The relevant bit of the error:
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec  
 FAILURE!
 testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
   Time elapsed: 0.038 sec   FAILURE!
 java.lang.AssertionError: Reference XML edits and parsed to XML should be same
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2284) RW Http access to HDFS

2011-09-02 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2284:
-

Attachment: h2284_20110902b.patch

h2284_20110902b.patch: A patch for preview.  It only has HttpFileSystem 
(httpfs://) with mkdirs and getFileStatus.

 RW Http access to HDFS
 --

 Key: HDFS-2284
 URL: https://issues.apache.org/jira/browse/HDFS-2284
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h2284_20110902b.patch


 HFTP allows on read access to HDFS via HTTP. Add RW HTTP access to HDFS)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk

2011-09-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096125#comment-13096125
 ] 

Hudson commented on HDFS-2299:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #834 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/834/])
HDFS-2299. TestOfflineEditsViewer is failing on trunk. (Uma Maheswara Rao G 
via atm)

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1164624
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml


 TestOfflineEditsViewer is failing on trunk
 --

 Key: HDFS-2299
 URL: https://issues.apache.org/jira/browse/HDFS-2299
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
Assignee: Uma Maheswara Rao G
 Fix For: 0.24.0

 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, 
 HDFS-2299.patch


 The relevant bit of the error:
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec  
 FAILURE!
 testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
   Time elapsed: 0.038 sec   FAILURE!
 java.lang.AssertionError: Reference XML edits and parsed to XML should be same
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2307) More Coverage needed for FSDirectory

2011-09-02 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-2307:
--

Fix Version/s: 0.22.0

 More Coverage needed for FSDirectory
 

 Key: HDFS-2307
 URL: https://issues.apache.org/jira/browse/HDFS-2307
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.22.0
Reporter: Benoy Antony
 Fix For: 0.22.0

 Attachments: 59.html


 The unit tests do not cover some of the symlink logic in FSDirectory. 
 The impact of adding a symlink on the nameQuota is not covered.
 The unit test coverage for FSDirectory is attached.  The uncovered lines are 
 in  addToParent function. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2284) RW Http access to HDFS

2011-09-02 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096152#comment-13096152
 ] 

Todd Lipcon commented on HDFS-2284:
---

If this is HDFS-specific, could we make the scheme something like hdfs+http:// 
or http+hdfs:// to indicate the encapsulation? I always found hftp://; to be 
very confusing to users who thought it had something to do with FTP. I can see 
users being equally confused if they try to do hadoop fs -cat 
httpfs://myserver/path/to/tarball.tgz.

 RW Http access to HDFS
 --

 Key: HDFS-2284
 URL: https://issues.apache.org/jira/browse/HDFS-2284
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h2284_20110902b.patch


 HFTP allows on read access to HDFS via HTTP. Add RW HTTP access to HDFS)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1779) After NameNode restart , Clients can not read partial files even after client invokes Sync.

2011-09-02 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1779:


Attachment: bbwReportAppend.patch

Here comes the patch:
1. It adds a new RPC  blocksBeingWrittenReport that allows datanode to send a 
report of bbw blocks.
2. DataNode sends a bbw block report when registering with NameNode
3. FSDatasetInterface is enhanced with getBlocksBeingWrittenReport API.


 After NameNode restart , Clients can not read partial files even after client 
 invokes Sync.
 ---

 Key: HDFS-1779
 URL: https://issues.apache.org/jira/browse/HDFS-1779
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, name-node
Affects Versions: 0.20-append
 Environment: Linux
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 0.20-append

 Attachments: HDFS-1779.1.patch, HDFS-1779.patch, bbwReportAppend.patch


 In Append HDFS-200 issue,
 If file has 10 blocks and after writing 5 blocks if client invokes sync 
 method then NN will persist the blocks information in edits. 
 After this if we restart the NN, All the DataNodes will reregister with NN. 
 But DataNodes are not sending the blocks being written information to NN. DNs 
 are sending the blocksBeingWritten information in DN startup. So, here 
 NameNode can not find that the 5 persisted blocks belongs to which datanodes. 
 This information can build based on block reports from DN. Otherwise we will 
 loose this 5 blocks information even NN persisted that block information in 
 edits. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1779) After NameNode restart , Clients can not read partial files even after client invokes Sync.

2011-09-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096163#comment-13096163
 ] 

Hadoop QA commented on HDFS-1779:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12492778/bbwReportAppend.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1188//console

This message is automatically generated.

 After NameNode restart , Clients can not read partial files even after client 
 invokes Sync.
 ---

 Key: HDFS-1779
 URL: https://issues.apache.org/jira/browse/HDFS-1779
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, name-node
Affects Versions: 0.20-append
 Environment: Linux
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 0.20-append

 Attachments: HDFS-1779.1.patch, HDFS-1779.patch, bbwReportAppend.patch


 In Append HDFS-200 issue,
 If file has 10 blocks and after writing 5 blocks if client invokes sync 
 method then NN will persist the blocks information in edits. 
 After this if we restart the NN, All the DataNodes will reregister with NN. 
 But DataNodes are not sending the blocks being written information to NN. DNs 
 are sending the blocksBeingWritten information in DN startup. So, here 
 NameNode can not find that the 5 persisted blocks belongs to which datanodes. 
 This information can build based on block reports from DN. Otherwise we will 
 loose this 5 blocks information even NN persisted that block information in 
 edits. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2018) 1073: Move all journal stream management code into one place

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096160#comment-13096160
 ] 

Jitendra Nath Pandey commented on HDFS-2018:


Todd, are you ok with committing this now? The patch is inline with the what we 
had agreed before.

 1073: Move all journal stream management code into one place
 

 Key: HDFS-2018
 URL: https://issues.apache.org/jira/browse/HDFS-2018
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, hdfs-2018-otherapi.txt, 
 hdfs-2018.txt


 Currently in the HDFS-1073 branch, the code for creating output streams is in 
 FileJournalManager and the code for input streams is in the inspectors. This 
 change does a number of things.
   - Input and Output streams are now created by the JournalManager.
   - FSImageStorageInspectors now deals with URIs when referring to edit logs
   - Recovery of inprogress logs is performed by counting the number of 
 transactions instead of looking at the length of the file.
 The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2232) TestHDFSCLI fails on 0.22 branch

2011-09-02 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-2232:
---

Attachment: HDFS-2232.patch

Patch for trunk.

 TestHDFSCLI fails on 0.22 branch
 

 Key: HDFS-2232
 URL: https://issues.apache.org/jira/browse/HDFS-2232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2232.patch, HDFS-2232.patch, HDFS-2232.patch, 
 HDFS-2232.patch, TEST-org.apache.hadoop.cli.TestHDFSCLI.txt, 
 TEST-org.apache.hadoop.cli.TestHDFSCLI.txt


 Several HDFS CLI tests fail on 0.22 branch. I can see 2 reasons:
 # Not generic enough regular expression for host names and paths. Similar to 
 MAPREDUCE-2304.
 # Some command outputs have new-line in the end.
 # And some seem to produce [much] more output than expected.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2281) NPE in checkpoint during processIOError()

2011-09-02 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2281:
--

Status: Open  (was: Patch Available)

 NPE in checkpoint during processIOError()
 -

 Key: HDFS-2281
 URL: https://issues.apache.org/jira/browse/HDFS-2281
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Uma Maheswara Rao G
 Fix For: 0.22.0

 Attachments: BN-bug-NPE.txt, HDFS-2281.patch


 At the end of checkpoint BackupNode tries to convergeJournalSpool() and calls 
 revertFileStreams(). The latter closes each file stream, and tries to rename 
 the corresponding file to its permanent location current/edits. If for any 
 reason the rename fails processIOError() is called for failed streams. 
 processIOError() will try to close the stream again and will get NPE in 
 EditLogFileOutputStream.close() because bufCurrent was set to null by the 
 previous close.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2284) RW Http access to HDFS

2011-09-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096166#comment-13096166
 ] 

Allen Wittenauer commented on HDFS-2284:


webhdfs://  (just like webnfs://)



 RW Http access to HDFS
 --

 Key: HDFS-2284
 URL: https://issues.apache.org/jira/browse/HDFS-2284
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h2284_20110902b.patch


 HFTP allows on read access to HDFS via HTTP. Add RW HTTP access to HDFS)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2281) NPE in checkpoint during processIOError()

2011-09-02 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2281:
--

Attachment: HDFS-2281.1.patch

 NPE in checkpoint during processIOError()
 -

 Key: HDFS-2281
 URL: https://issues.apache.org/jira/browse/HDFS-2281
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Uma Maheswara Rao G
 Fix For: 0.22.0

 Attachments: BN-bug-NPE.txt, HDFS-2281.1.patch, HDFS-2281.patch


 At the end of checkpoint BackupNode tries to convergeJournalSpool() and calls 
 revertFileStreams(). The latter closes each file stream, and tries to rename 
 the corresponding file to its permanent location current/edits. If for any 
 reason the rename fails processIOError() is called for failed streams. 
 processIOError() will try to close the stream again and will get NPE in 
 EditLogFileOutputStream.close() because bufCurrent was set to null by the 
 previous close.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2281) NPE in checkpoint during processIOError()

2011-09-02 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2281:
--

Status: Patch Available  (was: Open)

Hi Konstantin,

Thanks a lot for taking a look.

Actually this has been re-factored in trunk. So, as part of porting i included 
them. Actually we can make use of processIOError only.

So, Used processIOError for handling the failed storage directories.

Konstantin, can you take a look.

Test-patch results:

[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac 
compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number 
of release audit warnings.
[exec]
[exec] +1 system test framework. The patch passed system test framework compile.

==
 Finished
==

Thanks
Uma

 NPE in checkpoint during processIOError()
 -

 Key: HDFS-2281
 URL: https://issues.apache.org/jira/browse/HDFS-2281
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Uma Maheswara Rao G
 Fix For: 0.22.0

 Attachments: BN-bug-NPE.txt, HDFS-2281.1.patch, HDFS-2281.patch


 At the end of checkpoint BackupNode tries to convergeJournalSpool() and calls 
 revertFileStreams(). The latter closes each file stream, and tries to rename 
 the corresponding file to its permanent location current/edits. If for any 
 reason the rename fails processIOError() is called for failed streams. 
 processIOError() will try to close the stream again and will get NPE in 
 EditLogFileOutputStream.close() because bufCurrent was set to null by the 
 previous close.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2018) 1073: Move all journal stream management code into one place

2011-09-02 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096175#comment-13096175
 ] 

Todd Lipcon commented on HDFS-2018:
---

OK. I still think it's a worse API than the other patch I had attached a while 
back, for the reasons mentioned above, but I won't block it. Go ahead and 
commit.

 1073: Move all journal stream management code into one place
 

 Key: HDFS-2018
 URL: https://issues.apache.org/jira/browse/HDFS-2018
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, hdfs-2018-otherapi.txt, 
 hdfs-2018.txt


 Currently in the HDFS-1073 branch, the code for creating output streams is in 
 FileJournalManager and the code for input streams is in the inspectors. This 
 change does a number of things.
   - Input and Output streams are now created by the JournalManager.
   - FSImageStorageInspectors now deals with URIs when referring to edit logs
   - Recovery of inprogress logs is performed by counting the number of 
 transactions instead of looking at the length of the file.
 The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2281) NPE in checkpoint during processIOError()

2011-09-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096177#comment-13096177
 ] 

Hadoop QA commented on HDFS-2281:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12492780/HDFS-2281.1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1190//console

This message is automatically generated.

 NPE in checkpoint during processIOError()
 -

 Key: HDFS-2281
 URL: https://issues.apache.org/jira/browse/HDFS-2281
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Uma Maheswara Rao G
 Fix For: 0.22.0

 Attachments: BN-bug-NPE.txt, HDFS-2281.1.patch, HDFS-2281.patch


 At the end of checkpoint BackupNode tries to convergeJournalSpool() and calls 
 revertFileStreams(). The latter closes each file stream, and tries to rename 
 the corresponding file to its permanent location current/edits. If for any 
 reason the rename fails processIOError() is called for failed streams. 
 processIOError() will try to close the stream again and will get NPE in 
 EditLogFileOutputStream.close() because bufCurrent was set to null by the 
 previous close.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2284) RW Http access to HDFS

2011-09-02 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096182#comment-13096182
 ] 

Alejandro Abdelnur commented on HDFS-2284:
--

@Nicholas, I didn't see a follow up on my comment of Hoop being used as a next 
HFTP as well.



 RW Http access to HDFS
 --

 Key: HDFS-2284
 URL: https://issues.apache.org/jira/browse/HDFS-2284
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h2284_20110902b.patch


 HFTP allows on read access to HDFS via HTTP. Add RW HTTP access to HDFS)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-362) FSEditLog should not writes long and short as UTF8 and should not use ArrayWritable for writing non-array items

2011-09-02 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-362:
-

Affects Version/s: 0.23.0

 FSEditLog should not writes long and short as UTF8 and should not use 
 ArrayWritable for writing non-array items
 ---

 Key: HDFS-362
 URL: https://issues.apache.org/jira/browse/HDFS-362
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-362.1.patch, HDFS-362.2.patch, HDFS-362.2b.patch, 
 HDFS-362.2c.patch, HDFS-362.2d.patch, HDFS-362.2d.patch, HDFS-362.patch


 In FSEditLog, 
 - long and short are first converted to String and are further converted to 
 UTF8
 - For some non-array items, it first create an ArrayWritable object to hold 
 all the items and then writes the ArrayWritable object.
 These result creating many intermediate objects which affects Namenode CPU 
 performance and Namenode restart.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-962) Make DFSOutputStream MAX_PACKETS configurable

2011-09-02 Thread Justin Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Joseph updated HDFS-962:
---

Status: Open  (was: Patch Available)

 Make DFSOutputStream MAX_PACKETS configurable
 -

 Key: HDFS-962
 URL: https://issues.apache.org/jira/browse/HDFS-962
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Justin Joseph
Priority: Minor
 Attachments: HDFS-962.1.patch, HDFS-962.patch


 HDFS-959 suggests that the MAX_PACKETS variable (which determines how many 
 outstanding data packets the DFSOutputStream will permit) may have an impact 
 on performance. If so, we should make it configurable to trade off between 
 memory and performance. I think it ought to be a secret/undocumented config 
 for now - this will make it easier to benchmark without confusing users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-962) Make DFSOutputStream MAX_PACKETS configurable

2011-09-02 Thread Justin Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Joseph updated HDFS-962:
---

Attachment: HDFS-962.1.patch

 Make DFSOutputStream MAX_PACKETS configurable
 -

 Key: HDFS-962
 URL: https://issues.apache.org/jira/browse/HDFS-962
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Justin Joseph
Priority: Minor
 Attachments: HDFS-962.1.patch, HDFS-962.patch


 HDFS-959 suggests that the MAX_PACKETS variable (which determines how many 
 outstanding data packets the DFSOutputStream will permit) may have an impact 
 on performance. If so, we should make it configurable to trade off between 
 memory and performance. I think it ought to be a secret/undocumented config 
 for now - this will make it easier to benchmark without confusing users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-962) Make DFSOutputStream MAX_PACKETS configurable

2011-09-02 Thread Justin Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096201#comment-13096201
 ] 

Justin Joseph commented on HDFS-962:


Hi Nicholas,

Thanks a lot for taking a look on this patch!

I have addressed your comments.

{quote}
*  change dfsMaxPackets to private and non-static.
{quote}
  Done

{quote}
* define constants for dfs.max.packets in DFSConfigKeys
{quote}
Added in DFSConfigKeys.


Thanks
Justin

 Make DFSOutputStream MAX_PACKETS configurable
 -

 Key: HDFS-962
 URL: https://issues.apache.org/jira/browse/HDFS-962
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Justin Joseph
Priority: Minor
 Attachments: HDFS-962.1.patch, HDFS-962.patch


 HDFS-959 suggests that the MAX_PACKETS variable (which determines how many 
 outstanding data packets the DFSOutputStream will permit) may have an impact 
 on performance. If so, we should make it configurable to trade off between 
 memory and performance. I think it ought to be a secret/undocumented config 
 for now - this will make it easier to benchmark without confusing users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-200:
--

Attachment: HDFS-200.20-security.1.patch

Patch for 20-security branch.

 In HDFS, sync() not yet guarantees data available to the new readers
 

 Key: HDFS-200
 URL: https://issues.apache.org/jira/browse/HDFS-200
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.20-append
Reporter: Tsz Wo (Nicholas), SZE
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.20-append

 Attachments: 4379_20081010TC3.java, HDFS-200.20-security.1.patch, 
 Reader.java, Reader.java, ReopenProblem.java, Writer.java, Writer.java, 
 checkLeases-fix-1.txt, checkLeases-fix-unit-test-1.txt, 
 fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, 
 fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders13_20.txt, 
 fsyncConcurrentReaders14_20.txt, fsyncConcurrentReaders15_20.txt, 
 fsyncConcurrentReaders16_20.txt, fsyncConcurrentReaders3.patch, 
 fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
 fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
 hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
 hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
 namenode.log, namenode.log, reopen_test.sh


 In the append design doc 
 (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
 says
 * A reader is guaranteed to be able to read data that was 'flushed' before 
 the reader opened the file
 However, this feature is not yet implemented.  Note that the operation 
 'flushed' is now called sync.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-142:
--

Attachment: HDFS-142.20-security.1.patch

Patch for 20-security branch uploaded.

 In 0.20, move blocks being written into a blocksBeingWritten directory
 --

 Key: HDFS-142
 URL: https://issues.apache.org/jira/browse/HDFS-142
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20-append
Reporter: Raghu Angadi
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.20-append

 Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, 
 HDFS-142-multiple-blocks-datanode-exception.patch, 
 HDFS-142.20-security.1.patch, HDFS-142_20-append2.patch, HDFS-142_20.patch, 
 appendFile-recheck-lease.txt, appendQuestions.txt, deleteTmp.patch, 
 deleteTmp2.patch, deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, 
 dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, 
 hdfs-142-commitBlockSynchronization-unknown-datanode.txt, 
 hdfs-142-minidfs-fix-from-409.txt, 
 hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, 
 hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, 
 recover-rbw-v2.txt, testfileappend4-deaddn.txt, 
 validateBlockMetaData-synchronized.txt


 Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp  
 directory since these files are not valid anymore. But in 0.18 it moves these 
 files to normal directory incorrectly making them valid blocks. One of the 
 following would work :
 - remove the tmp files during upgrade, or
 - if the files under /tmp are in pre-18 format (i.e. no generation), delete 
 them.
 Currently effect of this bug is that, these files end up failing block 
 verification and eventually get deleted. But cause incorrect over-replication 
 at the namenode before that.
 Also it looks like our policy regd treating files under tmp needs to be 
 defined better. Right now there are probably one or two more bugs with it. 
 Dhruba, please file them if you rememeber.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2232) TestHDFSCLI fails on 0.22 branch

2011-09-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096216#comment-13096216
 ] 

Hadoop QA commented on HDFS-2232:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12492779/HDFS-2232.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 131 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

-1 release audit.  The applied patch generated 1 release audit warnings 
(more than the trunk's current 0 warnings).

-1 core tests.  The patch failed these unit tests:

  org.apache.hadoop.hdfs.TestDfsOverAvroRpc
  
org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap
  org.apache.hadoop.hdfs.server.datanode.TestReplicasMap

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1189//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1189//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1189//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1189//console

This message is automatically generated.

 TestHDFSCLI fails on 0.22 branch
 

 Key: HDFS-2232
 URL: https://issues.apache.org/jira/browse/HDFS-2232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2232.patch, HDFS-2232.patch, HDFS-2232.patch, 
 HDFS-2232.patch, TEST-org.apache.hadoop.cli.TestHDFSCLI.txt, 
 TEST-org.apache.hadoop.cli.TestHDFSCLI.txt


 Several HDFS CLI tests fail on 0.22 branch. I can see 2 reasons:
 # Not generic enough regular expression for host names and paths. Similar to 
 MAPREDUCE-2304.
 # Some command outputs have new-line in the end.
 # And some seem to produce [much] more output than expected.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.

2011-09-02 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096242#comment-13096242
 ] 

Joep Rottinghuis commented on HDFS-2189:


Integration build should be kicked because the last published POM still have 
the erroneous reference to org.apache.hadooip#guava. This is failing downstream 
builds. See HBASE-4327

 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
 -

 Key: HDFS-2189
 URL: https://issues.apache.org/jira/browse/HDFS-2189
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2189-1.patch, patch.txt


 Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096247#comment-13096247
 ] 

Suresh Srinivas commented on HDFS-200:
--

+1 for the patch.

 In HDFS, sync() not yet guarantees data available to the new readers
 

 Key: HDFS-200
 URL: https://issues.apache.org/jira/browse/HDFS-200
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.20-append
Reporter: Tsz Wo (Nicholas), SZE
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.20-append

 Attachments: 4379_20081010TC3.java, HDFS-200.20-security.1.patch, 
 Reader.java, Reader.java, ReopenProblem.java, Writer.java, Writer.java, 
 checkLeases-fix-1.txt, checkLeases-fix-unit-test-1.txt, 
 fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, 
 fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders13_20.txt, 
 fsyncConcurrentReaders14_20.txt, fsyncConcurrentReaders15_20.txt, 
 fsyncConcurrentReaders16_20.txt, fsyncConcurrentReaders3.patch, 
 fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
 fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
 hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
 hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
 namenode.log, namenode.log, reopen_test.sh


 In the append design doc 
 (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
 says
 * A reader is guaranteed to be able to read data that was 'flushed' before 
 the reader opened the file
 However, this feature is not yet implemented.  Note that the operation 
 'flushed' is now called sync.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-200:
-

Fix Version/s: 0.20.205.0

 In HDFS, sync() not yet guarantees data available to the new readers
 

 Key: HDFS-200
 URL: https://issues.apache.org/jira/browse/HDFS-200
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.20-append
Reporter: Tsz Wo (Nicholas), SZE
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0

 Attachments: 4379_20081010TC3.java, HDFS-200.20-security.1.patch, 
 Reader.java, Reader.java, ReopenProblem.java, Writer.java, Writer.java, 
 checkLeases-fix-1.txt, checkLeases-fix-unit-test-1.txt, 
 fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, 
 fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders13_20.txt, 
 fsyncConcurrentReaders14_20.txt, fsyncConcurrentReaders15_20.txt, 
 fsyncConcurrentReaders16_20.txt, fsyncConcurrentReaders3.patch, 
 fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
 fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
 hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
 hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
 namenode.log, namenode.log, reopen_test.sh


 In the append design doc 
 (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
 says
 * A reader is guaranteed to be able to read data that was 'flushed' before 
 the reader opened the file
 However, this feature is not yet implemented.  Note that the operation 
 'flushed' is now called sync.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096250#comment-13096250
 ] 

Suresh Srinivas commented on HDFS-200:
--

I committed this change to 0.20-security.

 In HDFS, sync() not yet guarantees data available to the new readers
 

 Key: HDFS-200
 URL: https://issues.apache.org/jira/browse/HDFS-200
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.20-append
Reporter: Tsz Wo (Nicholas), SZE
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0

 Attachments: 4379_20081010TC3.java, HDFS-200.20-security.1.patch, 
 Reader.java, Reader.java, ReopenProblem.java, Writer.java, Writer.java, 
 checkLeases-fix-1.txt, checkLeases-fix-unit-test-1.txt, 
 fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, 
 fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders13_20.txt, 
 fsyncConcurrentReaders14_20.txt, fsyncConcurrentReaders15_20.txt, 
 fsyncConcurrentReaders16_20.txt, fsyncConcurrentReaders3.patch, 
 fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
 fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
 hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
 hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
 namenode.log, namenode.log, reopen_test.sh


 In the append design doc 
 (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
 says
 * A reader is guaranteed to be able to read data that was 'flushed' before 
 the reader opened the file
 However, this feature is not yet implemented.  Note that the operation 
 'flushed' is now called sync.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-988) saveNamespace race can corrupt the edits log

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-988:
-

Fix Version/s: 0.20.205.0

 saveNamespace race can corrupt the edits log
 

 Key: HDFS-988
 URL: https://issues.apache.org/jira/browse/HDFS-988
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: dhruba borthakur
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0, 0.22.0

 Attachments: 988-fixups.txt, HDFS-988.20-security.patch, 
 HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch, 
 hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-6.patch, hdfs-988-7.patch, 
 hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, 
 saveNamespace_20-append.patch


 The adminstrator puts the namenode is safemode and then issues the 
 savenamespace command. This can corrupt the edits log. The problem is that  
 when the NN enters safemode, there could still be pending logSycs occuring 
 from other threads. Now, the saveNamespace command, when executed, would save 
 a edits log with partial writes. I have seen this happen on 0.20.
 https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-826:
--

Attachment: HDFS-826.20-security.1.patch

 Allow a mechanism for an application to detect that datanode(s)  have died in 
 the write pipeline
 

 Key: HDFS-826
 URL: https://issues.apache.org/jira/browse/HDFS-826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.20-append
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.20-append, 0.21.0

 Attachments: HDFS-826-0.20-v2.patch, HDFS-826-0.20.patch, 
 HDFS-826.20-security.1.patch, Replicable4.txt, ReplicableHdfs.txt, 
 ReplicableHdfs2.txt, ReplicableHdfs3.txt


 HDFS does not replicate the last block of the file that is being currently 
 written to by an application. Every datanode death in the write pipeline 
 decreases the reliability of the last block of the currently-being-written 
 block. This situation can be improved if the application can be notified of a 
 datanode death in the write pipeline. Then, the application can decide what 
 is the right course of action to be taken on this event.
 In our use-case, the application can close the file on the first datanode 
 death, and start writing to a newly created file. This ensures that the 
 reliability guarantee of a block is close to 3 at all time.
 One idea is to make DFSOutoutStream. write() throw an exception if the number 
 of datanodes in the write pipeline fall below minimum.replication.factor that 
 is set on the client (this is backward compatible).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096253#comment-13096253
 ] 

Jitendra Nath Pandey commented on HDFS-826:
---

Uploaded patch for 20-security branch.

 Allow a mechanism for an application to detect that datanode(s)  have died in 
 the write pipeline
 

 Key: HDFS-826
 URL: https://issues.apache.org/jira/browse/HDFS-826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.20-append
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.20-append, 0.21.0

 Attachments: HDFS-826-0.20-v2.patch, HDFS-826-0.20.patch, 
 HDFS-826.20-security.1.patch, Replicable4.txt, ReplicableHdfs.txt, 
 ReplicableHdfs2.txt, ReplicableHdfs3.txt


 HDFS does not replicate the last block of the file that is being currently 
 written to by an application. Every datanode death in the write pipeline 
 decreases the reliability of the last block of the currently-being-written 
 block. This situation can be improved if the application can be notified of a 
 datanode death in the write pipeline. Then, the application can decide what 
 is the right course of action to be taken on this event.
 In our use-case, the application can close the file on the first datanode 
 death, and start writing to a newly created file. This ensures that the 
 reliability guarantee of a block is close to 3 at all time.
 One idea is to make DFSOutoutStream. write() throw an exception if the number 
 of datanodes in the write pipeline fall below minimum.replication.factor that 
 is set on the client (this is backward compatible).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-988) saveNamespace race can corrupt the edits log

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096254#comment-13096254
 ] 

Suresh Srinivas commented on HDFS-988:
--

+1 for the 20-security patch.

 saveNamespace race can corrupt the edits log
 

 Key: HDFS-988
 URL: https://issues.apache.org/jira/browse/HDFS-988
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: dhruba borthakur
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0, 0.22.0

 Attachments: 988-fixups.txt, HDFS-988.20-security.patch, 
 HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch, 
 hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-6.patch, hdfs-988-7.patch, 
 hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, 
 saveNamespace_20-append.patch


 The adminstrator puts the namenode is safemode and then issues the 
 savenamespace command. This can corrupt the edits log. The problem is that  
 when the NN enters safemode, there could still be pending logSycs occuring 
 from other threads. Now, the saveNamespace command, when executed, would save 
 a edits log with partial writes. I have seen this happen on 0.20.
 https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-630:
--

Attachment: HDFS-630.20-security.1.patch

Patch for 20-security branch uploaded.

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.20-append
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Fix For: 0.20-append, 0.21.0

 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
 HDFS-630.20-security.1.patch, HDFS-630.patch, hdfs-630-0.20-append.patch, 
 hdfs-630-0.20.txt


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-988) saveNamespace race can corrupt the edits log

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096261#comment-13096261
 ] 

Suresh Srinivas commented on HDFS-988:
--

I committed the patch to 0.20-security branch.

 saveNamespace race can corrupt the edits log
 

 Key: HDFS-988
 URL: https://issues.apache.org/jira/browse/HDFS-988
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: dhruba borthakur
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0, 0.22.0

 Attachments: 988-fixups.txt, HDFS-988.20-security.patch, 
 HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch, 
 hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-6.patch, hdfs-988-7.patch, 
 hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, 
 saveNamespace_20-append.patch


 The adminstrator puts the namenode is safemode and then issues the 
 savenamespace command. This can corrupt the edits log. The problem is that  
 when the NN enters safemode, there could still be pending logSycs occuring 
 from other threads. Now, the saveNamespace command, when executed, would save 
 a edits log with partial writes. I have seen this happen on 0.20.
 https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-988) saveNamespace race can corrupt the edits log

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096262#comment-13096262
 ] 

Suresh Srinivas commented on HDFS-988:
--

+1 for the patch.

 saveNamespace race can corrupt the edits log
 

 Key: HDFS-988
 URL: https://issues.apache.org/jira/browse/HDFS-988
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: dhruba borthakur
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0, 0.22.0

 Attachments: 988-fixups.txt, HDFS-988.20-security.patch, 
 HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch, 
 hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-6.patch, hdfs-988-7.patch, 
 hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, 
 saveNamespace_20-append.patch


 The adminstrator puts the namenode is safemode and then issues the 
 savenamespace command. This can corrupt the edits log. The problem is that  
 when the NN enters safemode, there could still be pending logSycs occuring 
 from other threads. Now, the saveNamespace command, when executed, would save 
 a edits log with partial writes. I have seen this happen on 0.20.
 https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1054) Remove unnecessary sleep after failure in nextBlockOutputStream

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-1054:
---

Attachment: HDFS-1054.20-security.1.patch

Patch for 20-security branch.

 Remove unnecessary sleep after failure in nextBlockOutputStream
 ---

 Key: HDFS-1054
 URL: https://issues.apache.org/jira/browse/HDFS-1054
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.20.3, 0.20-append, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.20-append, 0.21.0

 Attachments: HDFS-1054.20-security.1.patch, 
 hdfs-1054-0.20-append.txt, hdfs-1054.txt, hdfs-1054.txt


 If DFSOutputStream fails to create a pipeline, it currently sleeps 6 seconds 
 before retrying. I don't see a great reason to wait at all, much less 6 
 seconds (especially now that HDFS-630 ensures that a retry won't go back to 
 the bad node). We should at least make it configurable, and perhaps something 
 like backoff makes some sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-826:
-

Fix Version/s: 0.20.205.0

 Allow a mechanism for an application to detect that datanode(s)  have died in 
 the write pipeline
 

 Key: HDFS-826
 URL: https://issues.apache.org/jira/browse/HDFS-826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.20-append
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.20-append, 0.20.205.0, 0.21.0

 Attachments: HDFS-826-0.20-v2.patch, HDFS-826-0.20.patch, 
 HDFS-826.20-security.1.patch, Replicable4.txt, ReplicableHdfs.txt, 
 ReplicableHdfs2.txt, ReplicableHdfs3.txt


 HDFS does not replicate the last block of the file that is being currently 
 written to by an application. Every datanode death in the write pipeline 
 decreases the reliability of the last block of the currently-being-written 
 block. This situation can be improved if the application can be notified of a 
 datanode death in the write pipeline. Then, the application can decide what 
 is the right course of action to be taken on this event.
 In our use-case, the application can close the file on the first datanode 
 death, and start writing to a newly created file. This ensures that the 
 reliability guarantee of a block is close to 3 at all time.
 One idea is to make DFSOutoutStream. write() throw an exception if the number 
 of datanodes in the write pipeline fall below minimum.replication.factor that 
 is set on the client (this is backward compatible).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096267#comment-13096267
 ] 

Suresh Srinivas commented on HDFS-826:
--

+1 for the patch. I committed the patch to 0.20-security branch.

 Allow a mechanism for an application to detect that datanode(s)  have died in 
 the write pipeline
 

 Key: HDFS-826
 URL: https://issues.apache.org/jira/browse/HDFS-826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.20-append
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.20-append, 0.20.205.0, 0.21.0

 Attachments: HDFS-826-0.20-v2.patch, HDFS-826-0.20.patch, 
 HDFS-826.20-security.1.patch, Replicable4.txt, ReplicableHdfs.txt, 
 ReplicableHdfs2.txt, ReplicableHdfs3.txt


 HDFS does not replicate the last block of the file that is being currently 
 written to by an application. Every datanode death in the write pipeline 
 decreases the reliability of the last block of the currently-being-written 
 block. This situation can be improved if the application can be notified of a 
 datanode death in the write pipeline. Then, the application can decide what 
 is the right course of action to be taken on this event.
 In our use-case, the application can close the file on the first datanode 
 death, and start writing to a newly created file. This ensures that the 
 reliability guarantee of a block is close to 3 at all time.
 One idea is to make DFSOutoutStream. write() throw an exception if the number 
 of datanodes in the write pipeline fall below minimum.replication.factor that 
 is set on the client (this is backward compatible).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1141) completeFile does not check lease ownership

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-1141:
---

Attachment: HDFS-1141.20-security.1.patch

Patch for 20-security uploaded.

 completeFile does not check lease ownership
 ---

 Key: HDFS-1141
 URL: https://issues.apache.org/jira/browse/HDFS-1141
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.20-append, 0.22.0

 Attachments: HDFS-1141.20-security.1.patch, hdfs-1141-branch20.txt, 
 hdfs-1141.txt, hdfs-1141.txt


 completeFile should check that the caller still owns the lease of the file 
 that it's completing. This is for the 'testCompleteOtherLeaseHoldersFile' 
 case in HDFS-1139.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096275#comment-13096275
 ] 

Suresh Srinivas commented on HDFS-142:
--

Can you please add a banner to TestFileAppend4.java

 In 0.20, move blocks being written into a blocksBeingWritten directory
 --

 Key: HDFS-142
 URL: https://issues.apache.org/jira/browse/HDFS-142
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20-append
Reporter: Raghu Angadi
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.20-append

 Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, 
 HDFS-142-multiple-blocks-datanode-exception.patch, 
 HDFS-142.20-security.1.patch, HDFS-142_20-append2.patch, HDFS-142_20.patch, 
 appendFile-recheck-lease.txt, appendQuestions.txt, deleteTmp.patch, 
 deleteTmp2.patch, deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, 
 dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, 
 hdfs-142-commitBlockSynchronization-unknown-datanode.txt, 
 hdfs-142-minidfs-fix-from-409.txt, 
 hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, 
 hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, 
 recover-rbw-v2.txt, testfileappend4-deaddn.txt, 
 validateBlockMetaData-synchronized.txt


 Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp  
 directory since these files are not valid anymore. But in 0.18 it moves these 
 files to normal directory incorrectly making them valid blocks. One of the 
 following would work :
 - remove the tmp files during upgrade, or
 - if the files under /tmp are in pre-18 format (i.e. no generation), delete 
 them.
 Currently effect of this bug is that, these files end up failing block 
 verification and eventually get deleted. But cause incorrect over-replication 
 at the namenode before that.
 Also it looks like our policy regd treating files under tmp needs to be 
 defined better. Right now there are probably one or two more bugs with it. 
 Dhruba, please file them if you rememeber.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-142:
-

Fix Version/s: 0.20.205.0

 In 0.20, move blocks being written into a blocksBeingWritten directory
 --

 Key: HDFS-142
 URL: https://issues.apache.org/jira/browse/HDFS-142
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20-append
Reporter: Raghu Angadi
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0

 Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, 
 HDFS-142-multiple-blocks-datanode-exception.patch, 
 HDFS-142.20-security.1.patch, HDFS-142.20-security.2.patch, 
 HDFS-142_20-append2.patch, HDFS-142_20.patch, appendFile-recheck-lease.txt, 
 appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, 
 deleteTmp5_20.txt, deleteTmp_0.18.patch, 
 dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, 
 hdfs-142-commitBlockSynchronization-unknown-datanode.txt, 
 hdfs-142-minidfs-fix-from-409.txt, 
 hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, 
 hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, 
 recover-rbw-v2.txt, testfileappend4-deaddn.txt, 
 validateBlockMetaData-synchronized.txt


 Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp  
 directory since these files are not valid anymore. But in 0.18 it moves these 
 files to normal directory incorrectly making them valid blocks. One of the 
 following would work :
 - remove the tmp files during upgrade, or
 - if the files under /tmp are in pre-18 format (i.e. no generation), delete 
 them.
 Currently effect of this bug is that, these files end up failing block 
 verification and eventually get deleted. But cause incorrect over-replication 
 at the namenode before that.
 Also it looks like our policy regd treating files under tmp needs to be 
 defined better. Right now there are probably one or two more bugs with it. 
 Dhruba, please file them if you rememeber.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-142:
--

Attachment: HDFS-142.20-security.2.patch

Added Apache License header.

 In 0.20, move blocks being written into a blocksBeingWritten directory
 --

 Key: HDFS-142
 URL: https://issues.apache.org/jira/browse/HDFS-142
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20-append
Reporter: Raghu Angadi
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0

 Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, 
 HDFS-142-multiple-blocks-datanode-exception.patch, 
 HDFS-142.20-security.1.patch, HDFS-142.20-security.2.patch, 
 HDFS-142_20-append2.patch, HDFS-142_20.patch, appendFile-recheck-lease.txt, 
 appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, 
 deleteTmp5_20.txt, deleteTmp_0.18.patch, 
 dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, 
 hdfs-142-commitBlockSynchronization-unknown-datanode.txt, 
 hdfs-142-minidfs-fix-from-409.txt, 
 hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, 
 hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, 
 recover-rbw-v2.txt, testfileappend4-deaddn.txt, 
 validateBlockMetaData-synchronized.txt


 Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp  
 directory since these files are not valid anymore. But in 0.18 it moves these 
 files to normal directory incorrectly making them valid blocks. One of the 
 following would work :
 - remove the tmp files during upgrade, or
 - if the files under /tmp are in pre-18 format (i.e. no generation), delete 
 them.
 Currently effect of this bug is that, these files end up failing block 
 verification and eventually get deleted. But cause incorrect over-replication 
 at the namenode before that.
 Also it looks like our policy regd treating files under tmp needs to be 
 defined better. Right now there are probably one or two more bugs with it. 
 Dhruba, please file them if you rememeber.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1204) 0.20: Lease expiration should recover single files, not entire lease holder

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-1204:
---

Attachment: HDFS-1204.20-security.1.patch

Patch for 20-security.

 0.20: Lease expiration should recover single files, not entire lease holder
 ---

 Key: HDFS-1204
 URL: https://issues.apache.org/jira/browse/HDFS-1204
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: sam rash
 Fix For: 0.20-append

 Attachments: HDFS-1204.20-security.1.patch, hdfs-1204.txt, 
 hdfs-1204.txt


 This was brought up in HDFS-200 but didn't make it into the branch on Apache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-1057:
---

Attachment: HDFS-1057.20-security.1.patch

Patch for 20-security branch uploaded.

 Concurrent readers hit ChecksumExceptions if following a writer to very end 
 of file
 ---

 Key: HDFS-1057
 URL: https://issues.apache.org/jira/browse/HDFS-1057
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: sam rash
Priority: Blocker
 Fix For: 0.20-append, 0.21.0, 0.22.0

 Attachments: HDFS-1057-0.20-append.patch, 
 HDFS-1057.20-security.1.patch, conurrent-reader-patch-1.txt, 
 conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, 
 hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, 
 hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, hdfs-1057-trunk-6.txt


 In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before 
 calling flush(). Therefore, if there is a concurrent reader, it's possible to 
 race here - the reader will see the new length while those bytes are still in 
 the buffers of BlockReceiver. Thus the client will potentially see checksum 
 errors or EOFs. Additionally, the last checksum chunk of the file is made 
 accessible to readers even though it is not stable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-724:
--

Attachment: HDFS-724.20-security.1.patch

Patch for 20-security uploaded.

 Pipeline close hangs if one of the datanode is not responsive.
 --

 Key: HDFS-724
 URL: https://issues.apache.org/jira/browse/HDFS-724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.20-append, 0.21.0

 Attachments: HDFS-724.20-security.1.patch, h724_20091021.patch, 
 hbAckReply.patch, pipelineHeartbeat.patch, pipelineHeartbeat1.patch, 
 pipelineHeartbeat2.patch, stuckWriteAppend20.patch


 In the new pipeline design, pipeline close is implemented by sending an 
 additional empty packet.  If one of the datanode does not response to this 
 empty packet, the pipeline hangs.  It seems that there is no timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096328#comment-13096328
 ] 

Suresh Srinivas commented on HDFS-142:
--

+1 for the patch.

 In 0.20, move blocks being written into a blocksBeingWritten directory
 --

 Key: HDFS-142
 URL: https://issues.apache.org/jira/browse/HDFS-142
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20-append
Reporter: Raghu Angadi
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0

 Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, 
 HDFS-142-multiple-blocks-datanode-exception.patch, 
 HDFS-142.20-security.1.patch, HDFS-142.20-security.2.patch, 
 HDFS-142_20-append2.patch, HDFS-142_20.patch, appendFile-recheck-lease.txt, 
 appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, 
 deleteTmp5_20.txt, deleteTmp_0.18.patch, 
 dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, 
 hdfs-142-commitBlockSynchronization-unknown-datanode.txt, 
 hdfs-142-minidfs-fix-from-409.txt, 
 hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, 
 hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, 
 recover-rbw-v2.txt, testfileappend4-deaddn.txt, 
 validateBlockMetaData-synchronized.txt


 Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp  
 directory since these files are not valid anymore. But in 0.18 it moves these 
 files to normal directory incorrectly making them valid blocks. One of the 
 following would work :
 - remove the tmp files during upgrade, or
 - if the files under /tmp are in pre-18 format (i.e. no generation), delete 
 them.
 Currently effect of this bug is that, these files end up failing block 
 verification and eventually get deleted. But cause incorrect over-replication 
 at the namenode before that.
 Also it looks like our policy regd treating files under tmp needs to be 
 defined better. Right now there are probably one or two more bugs with it. 
 Dhruba, please file them if you rememeber.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096333#comment-13096333
 ] 

Suresh Srinivas commented on HDFS-142:
--

I committed the patch 0.20-security

 In 0.20, move blocks being written into a blocksBeingWritten directory
 --

 Key: HDFS-142
 URL: https://issues.apache.org/jira/browse/HDFS-142
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20-append
Reporter: Raghu Angadi
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0

 Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, 
 HDFS-142-multiple-blocks-datanode-exception.patch, 
 HDFS-142.20-security.1.patch, HDFS-142.20-security.2.patch, 
 HDFS-142_20-append2.patch, HDFS-142_20.patch, appendFile-recheck-lease.txt, 
 appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, 
 deleteTmp5_20.txt, deleteTmp_0.18.patch, 
 dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, 
 hdfs-142-commitBlockSynchronization-unknown-datanode.txt, 
 hdfs-142-minidfs-fix-from-409.txt, 
 hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, 
 hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, 
 recover-rbw-v2.txt, testfileappend4-deaddn.txt, 
 validateBlockMetaData-synchronized.txt


 Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp  
 directory since these files are not valid anymore. But in 0.18 it moves these 
 files to normal directory incorrectly making them valid blocks. One of the 
 following would work :
 - remove the tmp files during upgrade, or
 - if the files under /tmp are in pre-18 format (i.e. no generation), delete 
 them.
 Currently effect of this bug is that, these files end up failing block 
 verification and eventually get deleted. But cause incorrect over-replication 
 at the namenode before that.
 Also it looks like our policy regd treating files under tmp needs to be 
 defined better. Right now there are probably one or two more bugs with it. 
 Dhruba, please file them if you rememeber.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-895:
--

Attachment: HDFS-895.20-security.1.patch

Patch uploaded for 20-security.

 Allow hflush/sync to occur in parallel with new writes to the file
 --

 Key: HDFS-895
 URL: https://issues.apache.org/jira/browse/HDFS-895
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: dhruba borthakur
Assignee: Todd Lipcon
 Fix For: 0.20-append, 0.22.0

 Attachments: 895-delta-for-review.txt, HDFS-895.20-security.1.patch, 
 hdfs-895-0.20-append.txt, hdfs-895-0.20-append.txt, hdfs-895-20.txt, 
 hdfs-895-branch-20-append.txt, hdfs-895-ontopof-1497.txt, 
 hdfs-895-review.txt, hdfs-895-trunk.txt, hdfs-895.txt, hdfs-895.txt, 
 hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt


 In the current trunk, the HDFS client methods writeChunk() and hflush./sync 
 are syncronized. This means that if a hflush/sync is in progress, an 
 applicationn cannot write data to the HDFS client buffer. This reduces the 
 write throughput of the transaction log in HBase. 
 The hflush/sync should allow new writes to happen to the HDFS client even 
 when a hflush/sync is in progress. It can record the seqno of the message for 
 which it should receice the ack, indicate to the DataStream thread to star 
 flushing those messages, exit the synchronized section  and just wai for that 
 ack to arrive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2288) Replicas awaiting recovery should return a full visible length

2011-09-02 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096336#comment-13096336
 ] 

Todd Lipcon commented on HDFS-2288:
---

Nicholas: given the above, do you think this patch is correct?

 Replicas awaiting recovery should return a full visible length
 --

 Key: HDFS-2288
 URL: https://issues.apache.org/jira/browse/HDFS-2288
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: 0.23.0

 Attachments: hdfs-2288.txt


 Currently, if the client calls getReplicaVisibleLength for a RWR, it returns 
 a visible length of 0. This causes one of HBase's tests to fail, and I 
 believe it's incorrect behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1520) HDFS 20 append: Lightweight NameNode operation to trigger lease recovery

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-1520:
---

Attachment: HDFS-1520.20-security.1.patch

Patch for 20-security.

 HDFS 20 append: Lightweight NameNode operation to trigger lease recovery
 

 Key: HDFS-1520
 URL: https://issues.apache.org/jira/browse/HDFS-1520
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: name-node
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: HDFS-1520.20-security.1.patch, recoverLeaseApache20.patch


 Currently HBase uses append to trigger the close of HLog during Hlog split. 
 Append is a very expensive operation, which involves not only NameNode 
 operations but creating a writing pipeline. If one of datanodes on the 
 pipeline has a problem, this recovery may takes minutes. I'd like implement a 
 lightweight NameNode operation to trigger lease recovery and make HBase to 
 use this instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1555) HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-1555:
---

Attachment: HDFS-1555.20-security.1.patch

Patch for 20-security.

 HDFS 20 append: Disallow pipeline recovery if a file is already being lease 
 recovered
 -

 Key: HDFS-1555
 URL: https://issues.apache.org/jira/browse/HDFS-1555
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: HDFS-1555.20-security.1.patch, appendRecoveryRace.patch, 
 recoveryRace.patch


 When a file is under lease recovery and the writer is still alive, the write 
 pipeline will be killed and then the writer will start a pipeline recovery. 
 Sometimes the pipeline recovery may race before the lease recovery and as a 
 result fail the lease recovery. This is very bad if we want to support the 
 strong recoverLease semantics in HDFS-1554. So it would be nice if we could 
 disallow a file's pipeline recovery while its lease recovery is in progress.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-630:
-

Fix Version/s: 0.20.205.0

I committed the patch to 0.20-security branch.

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.20-append
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Fix For: 0.20-append, 0.20.205.0, 0.21.0

 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
 HDFS-630.20-security.1.patch, HDFS-630.patch, hdfs-630-0.20-append.patch, 
 hdfs-630-0.20.txt


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1554) Append 0.20: New semantics for recoverLease

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-1554:
---

Attachment: HDFS-1554.20-security.1.patch

Patch for 20-security.

 Append 0.20: New semantics for recoverLease
 ---

 Key: HDFS-1554
 URL: https://issues.apache.org/jira/browse/HDFS-1554
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: HDFS-1554.20-security.1.patch, appendRecoverLease.patch, 
 appendRecoverLease1.patch


 Current recoverLease API implemented in append 0.20 aims to provide a lighter 
 weight (comparing to using create/append) way to trigger a file's soft lease 
 expiration. From both the use case of hbase and scribe, it could have a 
 stronger semantics: revoking the file's lease, thus starting lease recovery 
 immediately.
 Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
 HBase is moving to HDFS 0.22.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1054) Remove unnecessary sleep after failure in nextBlockOutputStream

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1054:
--

Fix Version/s: 0.20.205.0

+1 for the patch. I committed the patch to 0.20-security branch.

 Remove unnecessary sleep after failure in nextBlockOutputStream
 ---

 Key: HDFS-1054
 URL: https://issues.apache.org/jira/browse/HDFS-1054
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.20.3, 0.20-append, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.20-append, 0.20.205.0, 0.21.0

 Attachments: HDFS-1054.20-security.1.patch, 
 hdfs-1054-0.20-append.txt, hdfs-1054.txt, hdfs-1054.txt


 If DFSOutputStream fails to create a pipeline, it currently sleeps 6 seconds 
 before retrying. I don't see a great reason to wait at all, much less 6 
 seconds (especially now that HDFS-630 ensures that a retry won't go back to 
 the bad node). We should at least make it configurable, and perhaps something 
 like backoff makes some sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1207) 0.20-append: stallReplicationWork should be volatile

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1207:
--

Fix Version/s: 0.20.205.0

I applied the attached patch to 0.20-security.

 0.20-append: stallReplicationWork should be volatile
 

 Key: HDFS-1207
 URL: https://issues.apache.org/jira/browse/HDFS-1207
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.20-append, 0.20.205.0

 Attachments: hdfs-1207.txt


 the stallReplicationWork member in FSNamesystem is accessed by multiple 
 threads without synchronization, but isn't marked volatile. I believe this is 
 responsible for about 1% failure rate on 
 TestFileAppend4.testAppendSyncChecksum* on my 8-core test boxes (looking at 
 logs I see replication happening even though we've supposedly disabled it)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1141) completeFile does not check lease ownership

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1141:
--

Fix Version/s: 0.20.205.0

+1 for the patch. I committed it to 0.20-security branch.

 completeFile does not check lease ownership
 ---

 Key: HDFS-1141
 URL: https://issues.apache.org/jira/browse/HDFS-1141
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0, 0.22.0

 Attachments: HDFS-1141.20-security.1.patch, hdfs-1141-branch20.txt, 
 hdfs-1141.txt, hdfs-1141.txt


 completeFile should check that the caller still owns the lease of the file 
 that it's completing. This is for the 'testCompleteOtherLeaseHoldersFile' 
 case in HDFS-1139.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1204) 0.20: Lease expiration should recover single files, not entire lease holder

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1204:
--

Fix Version/s: 0.20.205.0

+1 for the patch. I committed it to 0.20-security branch.

 0.20: Lease expiration should recover single files, not entire lease holder
 ---

 Key: HDFS-1204
 URL: https://issues.apache.org/jira/browse/HDFS-1204
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: sam rash
 Fix For: 0.20-append, 0.20.205.0

 Attachments: HDFS-1204.20-security.1.patch, hdfs-1204.txt, 
 hdfs-1204.txt


 This was brought up in HDFS-200 but didn't make it into the branch on Apache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1118) DFSOutputStream socket leak when cannot connect to DataNode

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096362#comment-13096362
 ] 

Suresh Srinivas commented on HDFS-1118:
---

I have committed the patch to 0.20-security branch.

 DFSOutputStream socket leak when cannot connect to DataNode
 ---

 Key: HDFS-1118
 URL: https://issues.apache.org/jira/browse/HDFS-1118
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.20-append, 0.21.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.20-append, 0.20.205.0, 0.22.0

 Attachments: HDFS-1118.1.patch, HDFS-1118.2.patch, 
 hdfs-1118.20s.patch, trunkPatch.txt


 The offending code is in {{DFSOutputStream.nextBlockOutputStream}}
 This function retries several times to call {{createBlockOutputStream}}. Each 
 time when it fails, it leaves a {{Socket}} object in {{DFSOutputStream.s}}.
 That object is never closed, but overwritten the next time 
 {{createBlockOutputStream}} is called.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1202) DataBlockScanner throws NPE when updated before initialized

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096364#comment-13096364
 ] 

Suresh Srinivas commented on HDFS-1202:
---

I committed the patch to 0.20-security branch.

 DataBlockScanner throws NPE when updated before initialized
 ---

 Key: HDFS-1202
 URL: https://issues.apache.org/jira/browse/HDFS-1202
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20-append, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.20-append, 0.20.205.0, 0.22.0

 Attachments: hdfs-1202-0.20-append.txt, hdfs-1202.20s.patch, 
 hdfs-1202.txt


 Missing an isInitialized() check in updateScanStatusInternal

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1346) DFSClient receives out of order packet ack

2011-09-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-1346:
---

Attachment: HDFS-1346.20-security.1.patch

Patch for 20-security, ported from 20-append.

 DFSClient receives out of order packet ack
 --

 Key: HDFS-1346
 URL: https://issues.apache.org/jira/browse/HDFS-1346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: HDFS-1346.20-security.1.patch, blockrecv-diff.txt, 
 outOfOrder.patch


 When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block 
 blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: 
 Expecting seq
 no for block blk_-2871223654872350746_21421120 10280 but received 10281
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)
 This indicates that DFS client expects an ack for packet N, but receives an 
 ack for packet N+1.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Closed] (HDFS-1729) Improve metrics for measuring NN startup costs.

2011-09-02 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HDFS-1729.
---


 Improve metrics for measuring NN startup costs.
 ---

 Key: HDFS-1729
 URL: https://issues.apache.org/jira/browse/HDFS-1729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Owen O'Malley
Assignee: Matt Foley
 Fix For: 0.20.203.0


 Current logging and metrics are insufficient to diagnose latency problems in 
 cluster startup.  Add:
 1. better logs in both Datanode and Namenode for Initial Block Report 
 processing, to help distinguish between block
 report processing problems and RPC/queuing problems;
 2. new logs to measure cost of scanning all blocks for over/under/invalid 
 replicas, which occurs in Namenode just
 before exiting safe mode;
 3. new logs to measure cost of processing the under/invalid replica queues 
 (created by the above mentioned scan), which
 occurs just after exiting safe mode, and is said to take 100% of CPU.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1779) After NameNode restart , Clients can not read partial files even after client invokes Sync.

2011-09-02 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096449#comment-13096449
 ] 

Todd Lipcon commented on HDFS-1779:
---

Mostly looks great. Small nits:

- indentation in SimulatedFSDataset
- there are some hard tabs in rejectAddStoredBlock
- typo: 'unregisterted'

Hairong, do you have a unit test? I have half of one here, similar to Uma's.

 After NameNode restart , Clients can not read partial files even after client 
 invokes Sync.
 ---

 Key: HDFS-1779
 URL: https://issues.apache.org/jira/browse/HDFS-1779
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, name-node
Affects Versions: 0.20-append
 Environment: Linux
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 0.20-append

 Attachments: HDFS-1779.1.patch, HDFS-1779.patch, bbwReportAppend.patch


 In Append HDFS-200 issue,
 If file has 10 blocks and after writing 5 blocks if client invokes sync 
 method then NN will persist the blocks information in edits. 
 After this if we restart the NN, All the DataNodes will reregister with NN. 
 But DataNodes are not sending the blocks being written information to NN. DNs 
 are sending the blocksBeingWritten information in DN startup. So, here 
 NameNode can not find that the 5 persisted blocks belongs to which datanodes. 
 This information can build based on block reports from DN. Otherwise we will 
 loose this 5 blocks information even NN persisted that block information in 
 edits. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1346) DFSClient receives out of order packet ack

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1346:
--

Fix Version/s: 0.20.205.0

+1 for the patch. I committed it to 0.20-security branch.

 DFSClient receives out of order packet ack
 --

 Key: HDFS-1346
 URL: https://issues.apache.org/jira/browse/HDFS-1346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append, 0.20.205.0

 Attachments: HDFS-1346.20-security.1.patch, blockrecv-diff.txt, 
 outOfOrder.patch


 When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block 
 blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: 
 Expecting seq
 no for block blk_-2871223654872350746_21421120 10280 but received 10281
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)
 This indicates that DFS client expects an ack for packet N, but receives an 
 ack for packet N+1.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096454#comment-13096454
 ] 

Suresh Srinivas commented on HDFS-724:
--

This patch does not compile for me.

 Pipeline close hangs if one of the datanode is not responsive.
 --

 Key: HDFS-724
 URL: https://issues.apache.org/jira/browse/HDFS-724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.20-append, 0.21.0

 Attachments: HDFS-724.20-security.1.patch, h724_20091021.patch, 
 hbAckReply.patch, pipelineHeartbeat.patch, pipelineHeartbeat1.patch, 
 pipelineHeartbeat2.patch, stuckWriteAppend20.patch


 In the new pipeline design, pipeline close is implemented by sending an 
 additional empty packet.  If one of the datanode does not response to this 
 empty packet, the pipeline hangs.  It seems that there is no timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1161) Make DN minimum valid volumes configurable

2011-09-02 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096475#comment-13096475
 ] 

Koji Noguchi commented on HDFS-1161:


bq. IIRC Koji's perspective was that an admin doesn't want to specify the count 
or percent of valid volumes

What I wanted was to keep the default behavior of shutting down the datanode 
when hitting faulty volume for 0.21 since we were seeing missing blocks after 
HDFS-457. I don't have preference between %disks and #disks.

 Make DN minimum valid volumes configurable
 --

 Key: HDFS-1161
 URL: https://issues.apache.org/jira/browse/HDFS-1161
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.21.0, 0.22.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.21.0

 Attachments: HDFS-1161-y20.patch, hdfs-1161-1.patch, 
 hdfs-1161-2.patch, hdfs-1161-3.patch, hdfs-1161-4.patch, hdfs-1161-5.patch, 
 hdfs-1161-6.patch


 The minimum number of non-faulty volumes to keep the DN active is hard-coded 
 to 1.  It would be useful to allow users to configure this value so the DN 
 can be taken offline when eg half of its disks fail, otherwise it doesn't get 
 reported until it's down to it's final disk and suffering degraded 
 performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1779) After NameNode restart , Clients can not read partial files even after client invokes Sync.

2011-09-02 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-1779:
---

Fix Version/s: 0.20.205.0

Maybe we need to include this for the upcoming 0.20.205+ release that has 
append support.

 After NameNode restart , Clients can not read partial files even after client 
 invokes Sync.
 ---

 Key: HDFS-1779
 URL: https://issues.apache.org/jira/browse/HDFS-1779
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, name-node
Affects Versions: 0.20-append
 Environment: Linux
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 0.20-append, 0.20.205.0

 Attachments: HDFS-1779.1.patch, HDFS-1779.patch, bbwReportAppend.patch


 In Append HDFS-200 issue,
 If file has 10 blocks and after writing 5 blocks if client invokes sync 
 method then NN will persist the blocks information in edits. 
 After this if we restart the NN, All the DataNodes will reregister with NN. 
 But DataNodes are not sending the blocks being written information to NN. DNs 
 are sending the blocksBeingWritten information in DN startup. So, here 
 NameNode can not find that the 5 persisted blocks belongs to which datanodes. 
 This information can build based on block reports from DN. Otherwise we will 
 loose this 5 blocks information even NN persisted that block information in 
 edits. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1057:
--

Fix Version/s: 0.20.205.0

+1 for the patch. I committed it to 0.20-security.

 Concurrent readers hit ChecksumExceptions if following a writer to very end 
 of file
 ---

 Key: HDFS-1057
 URL: https://issues.apache.org/jira/browse/HDFS-1057
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: sam rash
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0, 0.21.0, 0.22.0

 Attachments: HDFS-1057-0.20-append.patch, 
 HDFS-1057.20-security.1.patch, conurrent-reader-patch-1.txt, 
 conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, 
 hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, 
 hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, hdfs-1057-trunk-6.txt


 In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before 
 calling flush(). Therefore, if there is a concurrent reader, it's possible to 
 race here - the reader will see the new length while those bytes are still in 
 the buffers of BlockReceiver. Thus the client will potentially see checksum 
 errors or EOFs. Additionally, the last checksum chunk of the file is made 
 accessible to readers even though it is not stable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2281) NPE in checkpoint during processIOError()

2011-09-02 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096495#comment-13096495
 ] 

Konstantin Shvachko commented on HDFS-2281:
---

+1 I'll commit it to 0.22 branch.

 NPE in checkpoint during processIOError()
 -

 Key: HDFS-2281
 URL: https://issues.apache.org/jira/browse/HDFS-2281
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Uma Maheswara Rao G
 Fix For: 0.22.0

 Attachments: BN-bug-NPE.txt, HDFS-2281.1.patch, HDFS-2281.patch


 At the end of checkpoint BackupNode tries to convergeJournalSpool() and calls 
 revertFileStreams(). The latter closes each file stream, and tries to rename 
 the corresponding file to its permanent location current/edits. If for any 
 reason the rename fails processIOError() is called for failed streams. 
 processIOError() will try to close the stream again and will get NPE in 
 EditLogFileOutputStream.close() because bufCurrent was set to null by the 
 previous close.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096496#comment-13096496
 ] 

Suresh Srinivas commented on HDFS-724:
--

My bad, I had not applied HDFS-1057 patch which is required for this patch. +1 
for the patch.

 Pipeline close hangs if one of the datanode is not responsive.
 --

 Key: HDFS-724
 URL: https://issues.apache.org/jira/browse/HDFS-724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.20-append, 0.21.0

 Attachments: HDFS-724.20-security.1.patch, h724_20091021.patch, 
 hbAckReply.patch, pipelineHeartbeat.patch, pipelineHeartbeat1.patch, 
 pipelineHeartbeat2.patch, stuckWriteAppend20.patch


 In the new pipeline design, pipeline close is implemented by sending an 
 additional empty packet.  If one of the datanode does not response to this 
 empty packet, the pipeline hangs.  It seems that there is no timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096499#comment-13096499
 ] 

Suresh Srinivas commented on HDFS-895:
--

+1 for the patch.

 Allow hflush/sync to occur in parallel with new writes to the file
 --

 Key: HDFS-895
 URL: https://issues.apache.org/jira/browse/HDFS-895
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: dhruba borthakur
Assignee: Todd Lipcon
 Fix For: 0.20-append, 0.22.0

 Attachments: 895-delta-for-review.txt, HDFS-895.20-security.1.patch, 
 hdfs-895-0.20-append.txt, hdfs-895-0.20-append.txt, hdfs-895-20.txt, 
 hdfs-895-branch-20-append.txt, hdfs-895-ontopof-1497.txt, 
 hdfs-895-review.txt, hdfs-895-trunk.txt, hdfs-895.txt, hdfs-895.txt, 
 hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt


 In the current trunk, the HDFS client methods writeChunk() and hflush./sync 
 are syncronized. This means that if a hflush/sync is in progress, an 
 applicationn cannot write data to the HDFS client buffer. This reduces the 
 write throughput of the transaction log in HBase. 
 The hflush/sync should allow new writes to happen to the HDFS client even 
 when a hflush/sync is in progress. It can record the seqno of the message for 
 which it should receice the ack, indicate to the DataStream thread to star 
 flushing those messages, exit the synchronized section  and just wai for that 
 ack to arrive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-724:
-

Fix Version/s: 0.20.205.0

I committed the patch to 0.20-security branch.

 Pipeline close hangs if one of the datanode is not responsive.
 --

 Key: HDFS-724
 URL: https://issues.apache.org/jira/browse/HDFS-724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.20-append, 0.20.205.0, 0.21.0

 Attachments: HDFS-724.20-security.1.patch, h724_20091021.patch, 
 hbAckReply.patch, pipelineHeartbeat.patch, pipelineHeartbeat1.patch, 
 pipelineHeartbeat2.patch, stuckWriteAppend20.patch


 In the new pipeline design, pipeline close is implemented by sending an 
 additional empty packet.  If one of the datanode does not response to this 
 empty packet, the pipeline hangs.  It seems that there is no timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-895:
-

Fix Version/s: 0.20.205.0

I committed the patch to 0.20-security.

 Allow hflush/sync to occur in parallel with new writes to the file
 --

 Key: HDFS-895
 URL: https://issues.apache.org/jira/browse/HDFS-895
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: dhruba borthakur
Assignee: Todd Lipcon
 Fix For: 0.20-append, 0.20.205.0, 0.22.0

 Attachments: 895-delta-for-review.txt, HDFS-895.20-security.1.patch, 
 hdfs-895-0.20-append.txt, hdfs-895-0.20-append.txt, hdfs-895-20.txt, 
 hdfs-895-branch-20-append.txt, hdfs-895-ontopof-1497.txt, 
 hdfs-895-review.txt, hdfs-895-trunk.txt, hdfs-895.txt, hdfs-895.txt, 
 hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt


 In the current trunk, the HDFS client methods writeChunk() and hflush./sync 
 are syncronized. This means that if a hflush/sync is in progress, an 
 applicationn cannot write data to the HDFS client buffer. This reduces the 
 write throughput of the transaction log in HBase. 
 The hflush/sync should allow new writes to happen to the HDFS client even 
 when a hflush/sync is in progress. It can record the seqno of the message for 
 which it should receice the ack, indicate to the DataStream thread to star 
 flushing those messages, exit the synchronized section  and just wai for that 
 ack to arrive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1520) HDFS 20 append: Lightweight NameNode operation to trigger lease recovery

2011-09-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096504#comment-13096504
 ] 

Suresh Srinivas commented on HDFS-1520:
---

+1 for the 0.20-security patch.

 HDFS 20 append: Lightweight NameNode operation to trigger lease recovery
 

 Key: HDFS-1520
 URL: https://issues.apache.org/jira/browse/HDFS-1520
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: name-node
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: HDFS-1520.20-security.1.patch, recoverLeaseApache20.patch


 Currently HBase uses append to trigger the close of HLog during Hlog split. 
 Append is a very expensive operation, which involves not only NameNode 
 operations but creating a writing pipeline. If one of datanodes on the 
 pipeline has a problem, this recovery may takes minutes. I'd like implement a 
 lightweight NameNode operation to trigger lease recovery and make HBase to 
 use this instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1520) HDFS 20 append: Lightweight NameNode operation to trigger lease recovery

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1520:
--

Fix Version/s: 0.20.205.0

I committed the patch to 0.20-security branch.

 HDFS 20 append: Lightweight NameNode operation to trigger lease recovery
 

 Key: HDFS-1520
 URL: https://issues.apache.org/jira/browse/HDFS-1520
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: name-node
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append, 0.20.205.0

 Attachments: HDFS-1520.20-security.1.patch, recoverLeaseApache20.patch


 Currently HBase uses append to trigger the close of HLog during Hlog split. 
 Append is a very expensive operation, which involves not only NameNode 
 operations but creating a writing pipeline. If one of datanodes on the 
 pipeline has a problem, this recovery may takes minutes. I'd like implement a 
 lightweight NameNode operation to trigger lease recovery and make HBase to 
 use this instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1555) HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1555:
--

Fix Version/s: 0.20.205.0

+1 for the patch. I have committed it to 0.20-security branch.

 HDFS 20 append: Disallow pipeline recovery if a file is already being lease 
 recovered
 -

 Key: HDFS-1555
 URL: https://issues.apache.org/jira/browse/HDFS-1555
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append, 0.20.205.0

 Attachments: HDFS-1555.20-security.1.patch, appendRecoveryRace.patch, 
 recoveryRace.patch


 When a file is under lease recovery and the writer is still alive, the write 
 pipeline will be killed and then the writer will start a pipeline recovery. 
 Sometimes the pipeline recovery may race before the lease recovery and as a 
 result fail the lease recovery. This is very bad if we want to support the 
 strong recoverLease semantics in HDFS-1554. So it would be nice if we could 
 disallow a file's pipeline recovery while its lease recovery is in progress.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1554) Append 0.20: New semantics for recoverLease

2011-09-02 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1554:
--

Fix Version/s: 0.20.205.0

+1 for the patch. I committed it to 0.20-security.

 Append 0.20: New semantics for recoverLease
 ---

 Key: HDFS-1554
 URL: https://issues.apache.org/jira/browse/HDFS-1554
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append, 0.20.205.0

 Attachments: HDFS-1554.20-security.1.patch, appendRecoverLease.patch, 
 appendRecoverLease1.patch


 Current recoverLease API implemented in append 0.20 aims to provide a lighter 
 weight (comparing to using create/append) way to trigger a file's soft lease 
 expiration. From both the use case of hbase and scribe, it could have a 
 stronger semantics: revoking the file's lease, thus starting lease recovery 
 immediately.
 Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
 HBase is moving to HDFS 0.22.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira