date:20110523


[ 
https://issues.apache.org/jira/browse/HDFS-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037746#comment-13037746
 ] 

Matt Foley commented on HDFS-1952:
--

+1 on the v22 version.
Confirmed compiles successfully, and new unit test fails before the FSEditLog 
change, passes after.

 FSEditLog.open() appears to succeed even if all EDITS directories fail
 --

 Key: HDFS-1952
 URL: https://issues.apache.org/jira/browse/HDFS-1952
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Matt Foley
Assignee: Andrew Wang
  Labels: newbie
 Attachments: hdfs-1952-0.22.patch, hdfs-1952.patch, hdfs-1952.patch, 
 hdfs-1952.patch


 FSEditLog.open() appears to succeed even if all of the individual 
 directories failed to allow creation of an EditLogOutputStream.  The problem 
 and solution are essentially similar to that of HDFS-1505.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1727) fsck command can display command usage if user passes any illegal argument

2011-05-23 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-1727:
--

Description: 
In fsck command if user passes the arguments like
./hadoop fsck -test -files -blocks -racks
In this case it will take / and will display whole DFS information regarding to 
files,blocks,racks.

But here, we are hiding the user mistake. Instead of this, we can display the 
command usage if user passes any invalid argument like above.

If user passes illegal optional arguments like
./hadoop fsck /test -listcorruptfileblocks instead of
./hadoop fsck /test -list-corruptfileblocks also we can display the proper 
command usage

  was:
In fsck command if user passes the arguments like 
   ./hadoop fsck -test -files -blocks -racks 
   In this case it will take / and will display whole DFS information regarding 
to files,blocks,racks.

But here, we are hiding the user mistake. Instead of this, we can display the 
usage if user passes any invalid argument like above.


   Assignee: Uma Maheswara Rao G
Summary: fsck command can display command usage if user passes any 
illegal argument  (was: fsck command can display usage if user passes any other 
arguments with '-' ( other than -move, -delete, -files , -openforwrite, -blocks 
, -locations, -racks).)

 fsck command can display command usage if user passes any illegal argument
 --

 Key: HDFS-1727
 URL: https://issues.apache.org/jira/browse/HDFS-1727
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor

 In fsck command if user passes the arguments like
 ./hadoop fsck -test -files -blocks -racks
 In this case it will take / and will display whole DFS information regarding 
 to files,blocks,racks.
 But here, we are hiding the user mistake. Instead of this, we can display the 
 command usage if user passes any invalid argument like above.
 If user passes illegal optional arguments like
 ./hadoop fsck /test -listcorruptfileblocks instead of
 ./hadoop fsck /test -list-corruptfileblocks also we can display the proper 
 command usage

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

When namenode goes down while checkpointing and if is started again subsequent 
Checkpointing is always failing
--

 Key: HDFS-1981
 URL: https://issues.apache.org/jira/browse/HDFS-1981
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.23.0


This scenario is applicable in NN and BNN case.

When the namenode goes down after creating the edits.new, on subsequent restart 
the divertFileStreams will not happen to edits.new as the edits.new file is 
already present and the size is zero.

so on trying to saveCheckPoint an exception occurs 
2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
timestamp 2011-05-23 16:37:30. Checkpoint Aborted.

This is a bug or is that the behaviour.







--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1978) All but first option in LIBHDFS_OPTS is ignored

2011-05-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037912#comment-13037912
 ] 

Hudson commented on HDFS-1978:
--

Integrated in Hadoop-Hdfs-trunk #675 (See 
[https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/675/])
HDFS-1978. All but first option in LIBHDFS_OPTS is ignored. Contributed by 
Eli Collins

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1126312
Files : 
* /hadoop/hdfs/trunk/src/c++/libhdfs/hdfsJniHelper.c
* /hadoop/hdfs/trunk/CHANGES.txt


 All but first option in LIBHDFS_OPTS is ignored
 ---

 Key: HDFS-1978
 URL: https://issues.apache.org/jira/browse/HDFS-1978
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 0.21.0
 Environment: RHEL 5.5
 JDK 1.6.0_24
Reporter: Brock Noland
Assignee: Eli Collins
 Fix For: 0.22.0

 Attachments: HDFS-1978.0.patch, hdfs-1978-1.patch


 In getJNIEnv, we go though LIBHDFS_OPTS with strok and count the number of 
 args. Then create an array of options based on that information. But when we 
 actually setup the options we only the first arg. I believe the fix is pasted 
 inline.
 {noformat}
 Index: src/c++/libhdfs/hdfsJniHelper.c
 ===
 --- src/c++/libhdfs/hdfsJniHelper.c   (revision 1124544)
 +++ src/c++/libhdfs/hdfsJniHelper.c   (working copy)
 @@ -442,6 +442,7 @@
  int argNum = 1;
  for (;argNum  noArgs ; argNum++) {
  options[argNum].optionString = result; //optHadoopArg;
 +result = strtok( NULL, jvmArgDelims);
  }
  }
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-1982) Null pointer exception is thrown when NN restarts with a block lesser in size than the block that is present in DN1 but the generation stamp is greater in the NN

Null pointer exception is thrown when NN restarts with a block lesser in size 
than the block that is present in DN1 but the generation stamp is greater in 
the NN 
--

 Key: HDFS-1982
 URL: https://issues.apache.org/jira/browse/HDFS-1982
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.20-append


Conisder the following scenario. 
WE have a cluster with one NN and 2 DN.

We write some file.

One of the block is written in DN1 but not yet completed in DN2 local disk.

Now DN1 gets killed and so pipeline recovery happens for the block with the 
size as in DN2 but the generation stamp gets updated in the NN.

DN2 also gets killed.

Now restart NN and DN1
Now if NN restarts, the block that NN has greater time stamp but the size is 
lesser in the NN.

This leads to Null pointer exception in addstoredblock api




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

[
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037949#comment-13037949
]

Todd Lipcon commented on HDFS-1981:
---

Hi Ramkrishna. Can you provide a unit test which shows this issue? It would be
especially good to see such a test against 0.22, since HDFS-1073 will
restructure all this code when it's merged into 0.23.

When namenode goes down while checkpointing and if is started again
subsequent Checkpointing is always failing
--

Key: HDFS-1981
URL: https://issues.apache.org/jira/browse/HDFS-1981
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.23.0
Environment: Linux
Reporter: ramkrishna.s.vasudevan
Fix For: 0.23.0

This scenario is applicable in NN and BNN case.
When the namenode goes down after creating the edits.new, on subsequent
restart the divertFileStreams will not happen to edits.new as the edits.new
file is already present and the size is zero.
so on trying to saveCheckPoint an exception occurs
2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException:
GetImage failed. java.io.IOException: Namenode has an edit log with timestamp
of 2011-05-23 16:38:56 but new checkpoint was created using editlog with
timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.

[
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HDFS-1950:
-

Attachment: HDFS-1950-2.patch

Blocks that are under construction are not getting read if the blocks are
more than 10. Only complete blocks are read properly.

Key: HDFS-1950
URL: https://issues.apache.org/jira/browse/HDFS-1950
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs client, name-node
Affects Versions: 0.20-append
Reporter: ramkrishna.s.vasudevan
Fix For: 0.20-append

Attachments: HDFS-1950-2.patch

Before going to the root cause lets see the read behavior for a file having
more than 10 blocks in append case..
Logic:

There is prefetch size dfs.read.prefetch.size for the DFSInputStream which
has default value of 10
This prefetch size is the number of blocks that the client will fetch from
the namenode for reading a file..
For example lets assume that a file X having 22 blocks is residing in HDFS
The reader first fetches first 10 blocks from the namenode and start reading
After the above step , the reader fetches the next 10 blocks from NN and
continue reading
Then the reader fetches the remaining 2 blocks from NN and complete the write
Cause:
===
Lets see the cause for this issue now...
Scenario that will fail is Writer wrote 10+ blocks and a partial block and
called sync. Reader trying to read the file will not get the last partial
block .
Client first gets the 10 block locations from the NN. Now it checks whether
the file is under construction and if so it gets the size of the last partial
block from datanode and reads the full file
However when the number of blocks is more than 10, the last block will not be
in the first fetch. It will be in the second or other blocks(last block will
be in (num of blocks / 10)th fetch)
The problem now is, in DFSClient there is no logic to get the size of the
last partial block(as in case of point 1), for the rest of the fetches other
than first fetch, the reader will not be able to read the complete data
synced...!!
also the InputStream.available api uses the first fetched block size to
iterate. Ideally this size has to be increased

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1949) Number format Exception is displayed in Namenode UI when the chunk size field is blank or string value..


 [ 
https://issues.apache.org/jira/browse/HDFS-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HDFS-1949:
-

Attachment: hdfs-1949.patch

 Number format Exception is displayed in Namenode UI when the chunk size field 
 is blank or string value.. 
 -

 Key: HDFS-1949
 URL: https://issues.apache.org/jira/browse/HDFS-1949
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append, 0.21.0, 0.23.0
Reporter: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1949.patch, hdfs-1949.patch


 In the Namenode UI we have a text box to enter the chunk size.
 The expected value for the chunk size is a valid Integer value.
 If any invalid value, string or empty spaces are provided it throws number 
 format exception.
 The existing behaviour is like we need to consider the default value if no 
 value is specified.
 Soln
 
 We can handle numberformat exception and assign default value if invalid 
 value is specified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.

[
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HDFS-1950:
-

Status: Patch Available (was: Open)

Blocks that are under construction are not getting read if the blocks are
more than 10. Only complete blocks are read properly.

Attachments: HDFS-1950-2.patch

Before going to the root cause lets see the read behavior for a file having
more than 10 blocks in append case..
Logic:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1949) Number format Exception is displayed in Namenode UI when the chunk size field is blank or string value..


 [ 
https://issues.apache.org/jira/browse/HDFS-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HDFS-1949:
-

Status: Patch Available  (was: Open)

 Number format Exception is displayed in Namenode UI when the chunk size field 
 is blank or string value.. 
 -

 Key: HDFS-1949
 URL: https://issues.apache.org/jira/browse/HDFS-1949
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0, 0.20-append, 0.23.0
Reporter: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1949.patch, hdfs-1949.patch


 In the Namenode UI we have a text box to enter the chunk size.
 The expected value for the chunk size is a valid Integer value.
 If any invalid value, string or empty spaces are provided it throws number 
 format exception.
 The existing behaviour is like we need to consider the default value if no 
 value is specified.
 Soln
 
 We can handle numberformat exception and assign default value if invalid 
 value is specified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1949) Number format Exception is displayed in Namenode UI when the chunk size field is blank or string value..


 [ 
https://issues.apache.org/jira/browse/HDFS-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HDFS-1949:
-

Status: Open  (was: Patch Available)

 Number format Exception is displayed in Namenode UI when the chunk size field 
 is blank or string value.. 
 -

 Key: HDFS-1949
 URL: https://issues.apache.org/jira/browse/HDFS-1949
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0, 0.20-append, 0.23.0
Reporter: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1949.patch, hdfs-1949.patch


 In the Namenode UI we have a text box to enter the chunk size.
 The expected value for the chunk size is a valid Integer value.
 If any invalid value, string or empty spaces are provided it throws number 
 format exception.
 The existing behaviour is like we need to consider the default value if no 
 value is specified.
 Soln
 
 We can handle numberformat exception and assign default value if invalid 
 value is specified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.

2011-05-23 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037981#comment-13037981
]

Hadoop QA commented on HDFS-1950:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12480115/HDFS-1950-2.patch
against trunk revision 1126312.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/612//console

This message is automatically generated.

Blocks that are under construction are not getting read if the blocks are
more than 10. Only complete blocks are read properly.

Attachments: HDFS-1950-2.patch

Before going to the root cause lets see the read behavior for a file having
more than 10 blocks in append case..
Logic:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.

[
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037985#comment-13037985
]

ramkrishna.s.vasudevan commented on HDFS-1950:
--

This patch applies for 0.20 append branch.

Blocks that are under construction are not getting read if the blocks are
more than 10. Only complete blocks are read properly.

Attachments: HDFS-1950-2.patch

Before going to the root cause lets see the read behavior for a file having
more than 10 blocks in append case..
Logic:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1787) Not enough xcievers error should propagate to client

2011-05-23 Thread Jonathan Hsieh (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037996#comment-13037996
]

Jonathan Hsieh commented on HDFS-1787:
--

Actually, I assumed that the test focused on the patch in the 'changes' section
of the jenkins result of build 608. This actually ran the newly added test
case from the HDFS-1787 patch..

The
org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.pipeline_Fi_30
test seems to be intermittently failing. It also isn't reported by hudson.
Is there a reason why?

Not enough xcievers error should propagate to client
--

Key: HDFS-1787
URL: https://issues.apache.org/jira/browse/HDFS-1787
Project: Hadoop HDFS
Issue Type: Improvement
Components: data-node
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Jonathan Hsieh
Labels: newbie
Fix For: 0.23.0

Attachments: hdfs-1787.2.patch, hdfs-1787.3.patch, hdfs-1787.3.patch,
hdfs-1787.patch

We find that users often run into the default transceiver limits in the DN.
Putting aside the inherent issues with xceiver threads, it would be nice if
the xceiver limit exceeded error propagated to the client. Currently,
clients simply see an EOFException which is hard to interpret, and have to go
slogging through DN logs to find the underlying issue.
The data transfer protocol should be extended to either have a special error
code for not enough xceivers or should have some error code for generic
errors with which a string can be attached and propagated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1949) Number format Exception is displayed in Namenode UI when the chunk size field is blank or string value..

2011-05-23 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038005#comment-13038005
]

Hadoop QA commented on HDFS-1949:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12480116/hdfs-1949.patch
against trunk revision 1126312.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

-1 release audit. The applied patch generated 1 release audit warnings
(more than the trunk's current 0 warnings).

-1 core tests. The patch failed these core unit tests:
org.apache.hadoop.hdfs.TestDFSStorageStateRecovery

+1 contrib tests. The patch passed contrib unit tests.

+1 system test framework. The patch passed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/611//testReport/
Release audit warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/611//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/611//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/611//console

This message is automatically generated.

Number format Exception is displayed in Namenode UI when the chunk size field
is blank or string value..
-

Key: HDFS-1949
URL: https://issues.apache.org/jira/browse/HDFS-1949
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.20-append, 0.21.0, 0.23.0
Reporter: ramkrishna.s.vasudevan
Priority: Minor
Fix For: 0.23.0

Attachments: HDFS-1949.patch, hdfs-1949.patch

In the Namenode UI we have a text box to enter the chunk size.
The expected value for the chunk size is a valid Integer value.
If any invalid value, string or empty spaces are provided it throws number
format exception.
The existing behaviour is like we need to consider the default value if no
value is specified.
Soln

We can handle numberformat exception and assign default value if invalid
value is specified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections

2011-05-23 Thread Tsz Wo (Nicholas), SZE (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038055#comment-13038055
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1965:
--

Came up a question: By setting maxidletime to 0, is there a race condition that 
the timeout occurs before the first call, i.e. the proxy is closed before the 
first call?

 IPCs done using block token-based tickets can't reuse connections
 -

 Key: HDFS-1965
 URL: https://issues.apache.org/jira/browse/HDFS-1965
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: 0.22.0

 Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt, 
 hdfs-1965.txt


 This is the reason that TestFileConcurrentReaders has been failing a lot. 
 Reproducing a comment from HDFS-1057:
 The test has a thread which continually re-opens the file which is being 
 written to. Since the file's in the middle of being written, it makes an RPC 
 to the DataNode in order to determine the visible length of the file. This 
 RPC is authenticated using the block token which came back in the 
 LocatedBlocks object as the security ticket.
 When this RPC hits the IPC layer, it looks at its existing connections and 
 sees none that can be re-used, since the block token differs between the two 
 requesters. Hence, it reconnects, and we end up with hundreds or thousands of 
 IPC connections to the datanode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-1828) TestBlocksWithNotEnoughRacks intermittently fails assert

2011-05-23 Thread Tsz Wo (Nicholas), SZE (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-1828.
--

Resolution: Fixed
  Assignee: Matt Foley

Since we are not reverting the patch, re-close this.  If the test is still 
failing, please create a new issue.

 TestBlocksWithNotEnoughRacks intermittently fails assert
 

 Key: HDFS-1828
 URL: https://issues.apache.org/jira/browse/HDFS-1828
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 0.23.0
Reporter: Matt Foley
Assignee: Matt Foley
 Fix For: 0.23.0

 Attachments: TestBlocksWithNotEnoughRacks.java.patch, 
 TestBlocksWithNotEnoughRacks_v2.patch


 In 
 server.namenode.TestBlocksWithNotEnoughRacks.testSufficientlyReplicatedBlocksWithNotEnoughRacks
  
 assert fails at curReplicas == REPLICATION_FACTOR, but it seems that it 
 should go higher initially, and if the test doesn't wait for it to go back 
 down, it will fail false positive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections

[
https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038072#comment-13038072
]

Todd Lipcon commented on HDFS-1965:
---

I think in trunk, it's not possible, since the connection is only lazily opened
by the actual RPC to the DataNode. Then, it won't close since there's a call
outstanding.

In 0.22, it's possible that it will open one connection for the
getProtocolVersion() call and a second one for the actual RPC. Unless I'm
missing something, that should only be an efficiency issue and not a
correctness issue. Do you agree?

IPCs done using block token-based tickets can't reuse connections
-

Key: HDFS-1965
URL: https://issues.apache.org/jira/browse/HDFS-1965
Project: Hadoop HDFS
Issue Type: Bug
Components: security
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
Fix For: 0.22.0

Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt,
hdfs-1965.txt

This is the reason that TestFileConcurrentReaders has been failing a lot.
Reproducing a comment from HDFS-1057:
The test has a thread which continually re-opens the file which is being
written to. Since the file's in the middle of being written, it makes an RPC
to the DataNode in order to determine the visible length of the file. This
RPC is authenticated using the block token which came back in the
LocatedBlocks object as the security ticket.
When this RPC hits the IPC layer, it looks at its existing connections and
sees none that can be re-used, since the block token differs between the two
requesters. Hence, it reconnects, and we end up with hundreds or thousands of
IPC connections to the datanode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections

2011-05-23 Thread Tsz Wo (Nicholas), SZE (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038076#comment-13038076
]

Tsz Wo (Nicholas), SZE commented on HDFS-1965:
--

Okay, I fine with it since it is only a temporary fix.

+1 the 0.22 patch looks good.

IPCs done using block token-based tickets can't reuse connections
-

Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt,
hdfs-1965.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1853) refactor TestNodeCount to import standard node counting and wait for replication methods


 [ 
https://issues.apache.org/jira/browse/HDFS-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1853:
-

Issue Type: Improvement  (was: Sub-task)
Parent: (was: HDFS-1852)

 refactor TestNodeCount to import standard node counting and wait for 
 replication methods
 --

 Key: HDFS-1853
 URL: https://issues.apache.org/jira/browse/HDFS-1853
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 0.22.0
Reporter: Matt Foley

 Eli's suggestions for refactoring the three wait for loops in TestNodeCount 
 for re-usability (similar to what was done for HDFS-1562): You could augment 
 NameNodeAdapter#getReplicaInfo to return excess and live replica counts as 
 well and then just add waitFor[Live|Excess]Replicas methods to DFSTestUtil 
 and have TestNodeCount call them. This way we could re-use them in the other 
 replication tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1853) refactor TestNodeCount to import standard node counting and wait for replication methods


[ 
https://issues.apache.org/jira/browse/HDFS-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038098#comment-13038098
 ] 

Matt Foley commented on HDFS-1853:
--

Removed from HDFS-1852 umbrella task, since not related to recurring Hudson 
test failures.

 refactor TestNodeCount to import standard node counting and wait for 
 replication methods
 --

 Key: HDFS-1853
 URL: https://issues.apache.org/jira/browse/HDFS-1853
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 0.22.0
Reporter: Matt Foley

 Eli's suggestions for refactoring the three wait for loops in TestNodeCount 
 for re-usability (similar to what was done for HDFS-1562): You could augment 
 NameNodeAdapter#getReplicaInfo to return excess and live replica counts as 
 well and then just add waitFor[Live|Excess]Replicas methods to DFSTestUtil 
 and have TestNodeCount call them. This way we could re-use them in the other 
 replication tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-1853) refactor TestNodeCount to import standard node counting and wait for replication methods


 [ 
https://issues.apache.org/jira/browse/HDFS-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley reassigned HDFS-1853:


Assignee: Matt Foley

 refactor TestNodeCount to import standard node counting and wait for 
 replication methods
 --

 Key: HDFS-1853
 URL: https://issues.apache.org/jira/browse/HDFS-1853
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 0.22.0
Reporter: Matt Foley
Assignee: Matt Foley

 Eli's suggestions for refactoring the three wait for loops in TestNodeCount 
 for re-usability (similar to what was done for HDFS-1562): You could augment 
 NameNodeAdapter#getReplicaInfo to return excess and live replica counts as 
 well and then just add waitFor[Live|Excess]Replicas methods to DFSTestUtil 
 and have TestNodeCount call them. This way we could re-use them in the other 
 replication tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1401) TestFileConcurrentReader test case is still timing out / failing


[ 
https://issues.apache.org/jira/browse/HDFS-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038112#comment-13038112
 ] 

Matt Foley commented on HDFS-1401:
--

As of build 611 (May 23), we see:

TestFileConcurrentReader.testUnfinishedBlockCRCErrorTransferToVerySmallWrite 
failed almost every build through 604, but has passed the last five builds in 
which auto-test ran.  This may be fixed, but needs to still be watched for 
intermittent failure.

TestFileConcurrentReader.testUnfinishedBlockCRCErrorNormalTransferVerySmallWrite
 and
TestFileConcurrentReader.testUnfinishedBlockCRCErrorNormalTransfer failed 
intermittently through build 600, but have not failed since.  However, they are 
infrequent intermittent, and have been skipping six or eight builds between 
failures.  They remain on the watch list.

 TestFileConcurrentReader test case is still timing out / failing
 

 Key: HDFS-1401
 URL: https://issues.apache.org/jira/browse/HDFS-1401
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Tanping Wang
Priority: Critical
 Attachments: HDFS-1401.patch


 The unit test case, TestFileConcurrentReader after its most recent fix in 
 HDFS-1310 still times out when using java 1.6.0_07.  When using java 
 1.6.0_07, the test case simply hangs.  On apache Hudson build ( which 
 possibly is using a higher sub-version of java) this test case has presented 
 an inconsistent test result that it sometimes passes, some times fails. For 
 example, between the most recent build 423, 424 and build 425, there is no 
 effective change, however, the test case failed on build 424 and passed on 
 build 425
 build 424 test failed
 https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk/424/testReport/org.apache.hadoop.hdfs/TestFileConcurrentReader/
 build 425 test passed
 https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk/425/testReport/org.apache.hadoop.hdfs/TestFileConcurrentReader/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1401) TestFileConcurrentReader test case is still timing out / failing

2011-05-23 Thread sam rash (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038119#comment-13038119
 ] 

sam rash commented on HDFS-1401:


see todd's find in:

https://issues.apache.org/jira/browse/HDFS-1057


 TestFileConcurrentReader test case is still timing out / failing
 

 Key: HDFS-1401
 URL: https://issues.apache.org/jira/browse/HDFS-1401
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Tanping Wang
Priority: Critical
 Attachments: HDFS-1401.patch


 The unit test case, TestFileConcurrentReader after its most recent fix in 
 HDFS-1310 still times out when using java 1.6.0_07.  When using java 
 1.6.0_07, the test case simply hangs.  On apache Hudson build ( which 
 possibly is using a higher sub-version of java) this test case has presented 
 an inconsistent test result that it sometimes passes, some times fails. For 
 example, between the most recent build 423, 424 and build 425, there is no 
 effective change, however, the test case failed on build 424 and passed on 
 build 425
 build 424 test failed
 https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk/424/testReport/org.apache.hadoop.hdfs/TestFileConcurrentReader/
 build 425 test passed
 https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk/425/testReport/org.apache.hadoop.hdfs/TestFileConcurrentReader/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1852) Umbrella task: Clean up HDFS unit test recurring failures


[ 
https://issues.apache.org/jira/browse/HDFS-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038135#comment-13038135
 ] 

Matt Foley commented on HDFS-1852:
--

Besides the three remaining open issues above, we also have three 
infrequent-intermittent issues that may still exist.  All were last seen in 
build 594, so it possible they were addressed.

org.apache.hadoop.cli.TestHDFSCLI.testAll - v intermittent, last 566, 579, 587, 
594
org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.testErrorReplicas - v 
intermittent, last 559, 566, 579, 594
org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testBlockReplacement
 - v intermittent, last 565, 578, 594

Recording here for watchlist purposes.

 Umbrella task: Clean up HDFS unit test recurring failures 
 --

 Key: HDFS-1852
 URL: https://issues.apache.org/jira/browse/HDFS-1852
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.22.0
Reporter: Matt Foley

 Recurring failures and false positives undermine CI by encouraging developers 
 to ignore unit test failures.  Let's clean these up!
 Some are intermittent due to timing-sensitive conditions.  The unit tests for 
 background thread activities (such as block replication and corrupt replica 
 detection) often use wait while or wait until loops to detect results.  
 The quality and robustness of these loops vary widely, and common usages 
 should be moved to DFSTestUtil.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-236) Random read benchmark for DFS

2011-05-23 Thread Dave Thompson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Thompson updated HDFS-236:
---

Attachment: RndRead-TestDFSIO.patch

I've taken Raghu's patch from 6/27/09 with random read TestDFSIO enhancement, 
and ported it to the latest (now mapreduce) trunk 5/4/11 svn rev 1099590.   
Patch attached RndRead-TestDFSIO.patch.

enjoy,
Dave

 Random read benchmark for DFS
 -

 Key: HDFS-236
 URL: https://issues.apache.org/jira/browse/HDFS-236
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Attachments: HDFS-236.patch, RndRead-TestDFSIO.patch


 We should have at least one  random read benchmark that can be run with rest 
 of Hadoop benchmarks regularly.
 Please provide benchmark  ideas or requirements.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-1983) Fix path display for copy rm

2011-05-23 Thread Daryn Sharp (JIRA)

Fix path display for copy  rm
--

 Key: HDFS-1983
 URL: https://issues.apache.org/jira/browse/HDFS-1983
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.23.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


This will also fix a few misc broken tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log


 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-988:
-

Status: Open  (was: Patch Available)

removing patch available status since this still needs to be finished up.

 saveNamespace can corrupt edits log
 ---

 Key: HDFS-988
 URL: https://issues.apache.org/jira/browse/HDFS-988
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0, 0.20-append, 0.22.0
Reporter: dhruba borthakur
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.20-append, 0.22.0

 Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
 hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch


 The adminstrator puts the namenode is safemode and then issues the 
 savenamespace command. This can corrupt the edits log. The problem is that  
 when the NN enters safemode, there could still be pending logSycs occuring 
 from other threads. Now, the saveNamespace command, when executed, would save 
 a edits log with partial writes. I have seen this happen on 0.20.
 https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections

[
https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-1965:
--

Resolution: Fixed
Status: Resolved (was: Patch Available)

Committed the 22 patch. Thanks, Nicholas. HADOOP-7317 tracks the real
underlying issue.

IPCs done using block token-based tickets can't reuse connections
-

Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt,
hdfs-1965.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions


 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-988:


Summary: saveNamespace can corrupt edits log, apparently due to race 
conditions  (was: saveNamespace can corrupt edits log)

 saveNamespace can corrupt edits log, apparently due to race conditions
 --

 Key: HDFS-988
 URL: https://issues.apache.org/jira/browse/HDFS-988
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: dhruba borthakur
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.20-append, 0.22.0

 Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
 hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch


 The adminstrator puts the namenode is safemode and then issues the 
 savenamespace command. This can corrupt the edits log. The problem is that  
 when the NN enters safemode, there could still be pending logSycs occuring 
 from other threads. Now, the saveNamespace command, when executed, would save 
 a edits log with partial writes. I have seen this happen on 0.20.
 https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1603) Namenode gets sticky if one of namenode storage volumes disappears (removed, unmounted, etc.)

[
https://issues.apache.org/jira/browse/HDFS-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038245#comment-13038245
]

Todd Lipcon commented on HDFS-1603:
---

ATM and I just brainstormed about this a little bit over some iced coffee.
Though on the surface it doesn't look too hard to implement timeouts on namedir
operations, it would actually have to be done in a lot of places (eg
mkdirs/move calls on storage directories, writing edits, saving images, etc).
Timing out some of these things isn't entirely straightforward, since the
underlying calls aren't interruptible.

At some point we could attempt to tackle it, but looks like a complicated
project. So, rather than trying to implement this in software for now, it's
probably better to just recommend the proper NFS mount options when storing
name dirs on NFS.

Namenode gets sticky if one of namenode storage volumes disappears (removed,
unmounted, etc.)
-

Key: HDFS-1603
URL: https://issues.apache.org/jira/browse/HDFS-1603
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik

While investigating failures on HDFS-1602 it became apparent that once a
namenode storage volume is pulled out NN becomes completely sticky until
{{FSImage:processIOError: removing storage}} move the storage from the active
set. During this time none of normal NN operations are possible (e.g.
creating a directory on HDFS timeouts eventually).
In case of NFS this can be workaround'd with soft,intr,timeo,retrans
settings. However, a better handling of the situation is apparently possible
and needs to be implemented.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1967) TestHDFSTrash failing on trunk and 22


 [ 
https://issues.apache.org/jira/browse/HDFS-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1967:
-

Issue Type: Sub-task  (was: Bug)
Parent: HDFS-1852

 TestHDFSTrash failing on trunk and 22
 -

 Key: HDFS-1967
 URL: https://issues.apache.org/jira/browse/HDFS-1967
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 0.22.0
Reporter: Todd Lipcon
 Fix For: 0.22.0


 Seems to have started failing recently in many commit builds as well as the 
 last two nightly builds of 22:
 https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/51/testReport/org.apache.hadoop.hdfs/TestHDFSTrash/testTrashEmptier/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1967) TestHDFSTrash failing on trunk and 22


[ 
https://issues.apache.org/jira/browse/HDFS-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038276#comment-13038276
 ] 

Matt Foley commented on HDFS-1967:
--

TestHDFSTrash.testTrashEmptier() was failing on almost every Hudson build 
through 605.  However, has not failed for the last four auto-test builds.  
Watch-listing for trunk.

However, we'd like to understand what fixed it (if it is fixed) so we can apply 
the patch to v22 and yahoo-merge branches.

 TestHDFSTrash failing on trunk and 22
 -

 Key: HDFS-1967
 URL: https://issues.apache.org/jira/browse/HDFS-1967
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 0.22.0
Reporter: Todd Lipcon
 Fix For: 0.22.0


 Seems to have started failing recently in many commit builds as well as the 
 last two nightly builds of 22:
 https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/51/testReport/org.apache.hadoop.hdfs/TestHDFSTrash/testTrashEmptier/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1984) HDFS-1073: Enable multiple checkpointers to run simultaneously


 [ 
https://issues.apache.org/jira/browse/HDFS-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1984:
--

  Component/s: name-node
  Description: 
One of the motivations of HDFS-1073 is that it decouples the checkpoint process 
so that multiple checkpoints could be taken at the same time and not interfere 
with each other.

Currently on the 1073 branch this doesn't quite work right, since we have some 
state and validation in FSImage that's tied to a single fsimage_N -- thus if 
two 2NNs perform a checkpoint at different transaction IDs, only one will 
succeed.

As a stress test, we can run two 2NNs each configured with the 
fs.checkpoint.interval set to 0 which causes them to continuously checkpoint 
as fast as they can.
Affects Version/s: Edit log branch (HDFS-1073)
Fix Version/s: Edit log branch (HDFS-1073)

 HDFS-1073: Enable multiple checkpointers to run simultaneously
 --

 Key: HDFS-1984
 URL: https://issues.apache.org/jira/browse/HDFS-1984
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 One of the motivations of HDFS-1073 is that it decouples the checkpoint 
 process so that multiple checkpoints could be taken at the same time and not 
 interfere with each other.
 Currently on the 1073 branch this doesn't quite work right, since we have 
 some state and validation in FSImage that's tied to a single fsimage_N -- 
 thus if two 2NNs perform a checkpoint at different transaction IDs, only one 
 will succeed.
 As a stress test, we can run two 2NNs each configured with the 
 fs.checkpoint.interval set to 0 which causes them to continuously 
 checkpoint as fast as they can.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1984) HDFS-1073: Enable multiple checkpointers to run simultaneously

[
https://issues.apache.org/jira/browse/HDFS-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038288#comment-13038288
]

Todd Lipcon commented on HDFS-1984:
---

Currently this test scenario fails after a few seconds with an exception like:

11/05/23 15:25:46 WARN mortbay.log: /getimage: java.io.IOException: GetImage
failed. java.io.IOException: Namenode has an edit log corresponding to txid
1240 but new checkpoint was created using editlog ending at txid 1238.
Checkpoint Aborted.
at
org.apache.hadoop.hdfs.server.namenode.FSImage.validateCheckpointUpload(FSImage.java:894)
at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:107)
at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:80)

but this validation is bogus. So long as no two checkpointers try to upload a
checkpoint at the same txid, it's OK if they upload old fsimages.

To fix this, I think we need to do the following:

a) Repurpose the checkpointTxId field of FSImage. This currently tracks the
last txid at which the NN has either saved or uploaded a checkpoint. We use it
to advertise which image file a checkpointer should download, but we also use
it to validate the checkpoint upload. Instead, it should be renamed to
mostRecentImageTxId and only be used to advertise the image.

b) Remove the imageDigest field. The function of validation is now being done
by an adjacent .md5 file next to each image. When the checkpointer downloads
an image, the image transfer servlet can just read the .md5 file and include
the hash as an HTTP header. The checkpointer can then verify that it
transferred correctly by comparing the image it downloaded against that md5
hash. When uploading the new checkpoint back to the NN, the same process is
used in reverse.

The new validation rules for accepting a checkpoint upload should be:
- the namespace/clusterid/etc match up (same as today)
- the transaction ID of the uploaded image is less than the current transaction
ID of the namespace (sanity check)
- the hash of the file received matches the hash that the 2NN communicates for
a header

HDFS-1073: Enable multiple checkpointers to run simultaneously
--

Key: HDFS-1984
URL: https://issues.apache.org/jira/browse/HDFS-1984
Project: Hadoop HDFS
Issue Type: Sub-task
Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Fix For: Edit log branch (HDFS-1073)

One of the motivations of HDFS-1073 is that it decouples the checkpoint
process so that multiple checkpoints could be taken at the same time and not
interfere with each other.
Currently on the 1073 branch this doesn't quite work right, since we have
some state and validation in FSImage that's tied to a single fsimage_N --
thus if two 2NNs perform a checkpoint at different transaction IDs, only one
will succeed.
As a stress test, we can run two 2NNs each configured with the
fs.checkpoint.interval set to 0 which causes them to continuously
checkpoint as fast as they can.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1963) HDFS rpm integration project

2011-05-23 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDFS-1963:


Attachment: HDFS-1963-3.patch

Change configuration directory from $PREFIX/conf to $PREFIX/etc/hadoop per 
Owen's recommendation.  For RPM/deb, it will use /etc/hadoop as default, and 
create symlink for $PREFIX/etc/hadoop point to /etc/hadoop.

 HDFS rpm integration project
 

 Key: HDFS-1963
 URL: https://issues.apache.org/jira/browse/HDFS-1963
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: build
 Environment: Java 6, RHEL 5.5
Reporter: Eric Yang
Assignee: Eric Yang
 Attachments: HDFS-1963-1.patch, HDFS-1963-2.patch, HDFS-1963-3.patch, 
 HDFS-1963.patch


 This jira is corresponding to HADOOP-6255 and associated directory layout 
 change.  The patch for creating HDFS rpm packaging should be posted here for 
 patch test build to verify against hdfs svn trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-1985) HDFS-1073: Cleanup in image transfer servlet

HDFS-1073: Cleanup in image transfer servlet


 Key: HDFS-1985
 URL: https://issues.apache.org/jira/browse/HDFS-1985
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1985) HDFS-1073: Cleanup in image transfer servlet


 [ 
https://issues.apache.org/jira/browse/HDFS-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1985:
--

  Component/s: name-node
  Description: 
The TransferFsImage class has grown several heads and is somewhat confusing to 
follow. This JIRA is to refactor it a little bit.
- the TransferFsImage class contains static methods to put/get image and edits 
files. It's used by checkpointing nodes.  [the same static methods it has today]
- some common code from call sites of TransferFsImage are moved into 
TransferFsImage itself, so it presents a cleaner interface to checkpointers
- the non-static parts of TransferFsImage are moved to an inner class of 
GetImageServlet called GetImageParams, since they were only responsible for 
parameter parsing/validation.
Affects Version/s: Edit log branch (HDFS-1073)
Fix Version/s: Edit log branch (HDFS-1073)

 HDFS-1073: Cleanup in image transfer servlet
 

 Key: HDFS-1985
 URL: https://issues.apache.org/jira/browse/HDFS-1985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 The TransferFsImage class has grown several heads and is somewhat confusing 
 to follow. This JIRA is to refactor it a little bit.
 - the TransferFsImage class contains static methods to put/get image and 
 edits files. It's used by checkpointing nodes.  [the same static methods it 
 has today]
 - some common code from call sites of TransferFsImage are moved into 
 TransferFsImage itself, so it presents a cleaner interface to checkpointers
 - the non-static parts of TransferFsImage are moved to an inner class of 
 GetImageServlet called GetImageParams, since they were only responsible for 
 parameter parsing/validation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1985) HDFS-1073: Cleanup in image transfer servlet


 [ 
https://issues.apache.org/jira/browse/HDFS-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1985:
--

Attachment: hdfs-1985.txt

Attached patch does the above refactoring and cleanup.

While I was at it, I also made the client code check the HTTP response for 200 
OK status. This fixes the client error reporting behavior in the event that 
the server throws an exception while processing the request.

 HDFS-1073: Cleanup in image transfer servlet
 

 Key: HDFS-1985
 URL: https://issues.apache.org/jira/browse/HDFS-1985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1985.txt


 The TransferFsImage class has grown several heads and is somewhat confusing 
 to follow. This JIRA is to refactor it a little bit.
 - the TransferFsImage class contains static methods to put/get image and 
 edits files. It's used by checkpointing nodes.  [the same static methods it 
 has today]
 - some common code from call sites of TransferFsImage are moved into 
 TransferFsImage itself, so it presents a cleaner interface to checkpointers
 - the non-static parts of TransferFsImage are moved to an inner class of 
 GetImageServlet called GetImageParams, since they were only responsible for 
 parameter parsing/validation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-1986) Add an option for user to return http or https ports regardless of security is on/off in DFSUtil.getInfoServer()

2011-05-23 Thread Tanping Wang (JIRA)

Add an option for user to return http or https ports regardless of security is 
on/off in DFSUtil.getInfoServer()


 Key: HDFS-1986
 URL: https://issues.apache.org/jira/browse/HDFS-1986
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 0.23.0
Reporter: Tanping Wang
Assignee: Tanping Wang
Priority: Minor
 Fix For: 0.23.0


Currently DFSUtil.getInfoServer gets http port with security off and httpS port 
with security on.  However, we want to return http port regardless of security 
on/off for Cluster UI to use.  Add in a third Boolean parameter for user to 
decide whether to check security or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-422) fuse-dfs leaks FileSystem handles as it never disconnects them because the FileSystem.Cache does not do reference counting

2011-05-23 Thread Eli Collins (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-422.
--

Resolution: Duplicate

 fuse-dfs leaks FileSystem handles as it never disconnects them because the 
 FileSystem.Cache does not do reference counting
 --

 Key: HDFS-422
 URL: https://issues.apache.org/jira/browse/HDFS-422
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Pete Wyckoff
Priority: Minor

 since users may be doing multiple file operations at the same time, a single 
 task in fuse, can never call close() on a filesystem (ie 
 libhdfs::hdfsDisconnect) because there may be another thread for the same 
 user.
 as such, either fuse-dfs needs to do reference counting or FileSystem.Cache 
 needs to or maybe enable a mode where one can turn off the Cache??
 I currently am not seeing any problems in production, but I am still running 
 0.18 version which keeps only one connection as root.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1986) Add an option for user to return http or https ports regardless of security is on/off in DFSUtil.getInfoServer()

2011-05-23 Thread Tanping Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1986:
---

Attachment: HDFS-1986.patch

 Add an option for user to return http or https ports regardless of security 
 is on/off in DFSUtil.getInfoServer()
 

 Key: HDFS-1986
 URL: https://issues.apache.org/jira/browse/HDFS-1986
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 0.23.0
Reporter: Tanping Wang
Assignee: Tanping Wang
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1986.patch


 Currently DFSUtil.getInfoServer gets http port with security off and httpS 
 port with security on.  However, we want to return http port regardless of 
 security on/off for Cluster UI to use.  Add in a third Boolean parameter for 
 user to decide whether to check security or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1985) HDFS-1073: Cleanup in image transfer servlet

2011-05-23 Thread Eli Collins (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038311#comment-13038311
 ] 

Eli Collins commented on HDFS-1985:
---

+1 pending Hudson.  This is much nicer.

Nit: indent the throws in parseLongParam and downloadImageToStorage

 HDFS-1073: Cleanup in image transfer servlet
 

 Key: HDFS-1985
 URL: https://issues.apache.org/jira/browse/HDFS-1985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1985.txt


 The TransferFsImage class has grown several heads and is somewhat confusing 
 to follow. This JIRA is to refactor it a little bit.
 - the TransferFsImage class contains static methods to put/get image and 
 edits files. It's used by checkpointing nodes.  [the same static methods it 
 has today]
 - some common code from call sites of TransferFsImage are moved into 
 TransferFsImage itself, so it presents a cleaner interface to checkpointers
 - the non-static parts of TransferFsImage are moved to an inner class of 
 GetImageServlet called GetImageParams, since they were only responsible for 
 parameter parsing/validation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1985) HDFS-1073: Cleanup in image transfer servlet


[ 
https://issues.apache.org/jira/browse/HDFS-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038341#comment-13038341
 ] 

Todd Lipcon commented on HDFS-1985:
---

Will fix indentation nits on commit. This is on the 1073 branch, so Hudson 
won't run against it. I ran TestCheckpoint which covers this code pretty well, 
as well as a subset of tests (all those modified in the last 4 commits on the 
branch). They all passed (except for BN-related ones which are known to be 
broken at the moment)

 HDFS-1073: Cleanup in image transfer servlet
 

 Key: HDFS-1985
 URL: https://issues.apache.org/jira/browse/HDFS-1985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1985.txt


 The TransferFsImage class has grown several heads and is somewhat confusing 
 to follow. This JIRA is to refactor it a little bit.
 - the TransferFsImage class contains static methods to put/get image and 
 edits files. It's used by checkpointing nodes.  [the same static methods it 
 has today]
 - some common code from call sites of TransferFsImage are moved into 
 TransferFsImage itself, so it presents a cleaner interface to checkpointers
 - the non-static parts of TransferFsImage are moved to an inner class of 
 GetImageServlet called GetImageParams, since they were only responsible for 
 parameter parsing/validation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-1985) HDFS-1073: Cleanup in image transfer servlet


 [ 
https://issues.apache.org/jira/browse/HDFS-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-1985.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

 HDFS-1073: Cleanup in image transfer servlet
 

 Key: HDFS-1985
 URL: https://issues.apache.org/jira/browse/HDFS-1985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1985.txt


 The TransferFsImage class has grown several heads and is somewhat confusing 
 to follow. This JIRA is to refactor it a little bit.
 - the TransferFsImage class contains static methods to put/get image and 
 edits files. It's used by checkpointing nodes.  [the same static methods it 
 has today]
 - some common code from call sites of TransferFsImage are moved into 
 TransferFsImage itself, so it presents a cleaner interface to checkpointers
 - the non-static parts of TransferFsImage are moved to an inner class of 
 GetImageServlet called GetImageParams, since they were only responsible for 
 parameter parsing/validation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1969) Running rollback on new-version namenode destroys namespace

2011-05-23 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038345#comment-13038345
 ] 

Konstantin Shvachko commented on HDFS-1969:
---

Todd, what I meant is that {{NNStorage.setFields()}} should not contain the if 
statement
{code}
if (layoutVersion = -26) {
...
}
{code}
I actually don't see where it is triggered.
In general, we can allow these ifs in the loading part of the code, like 
loadFSImage(). But the saving part should be free of dependencies on the layout 
version, because there is only one LV - the current one.

The precondition sounds good. But it would be better to just convert it to 
assert. I don't think we've used {{com.google.*}} before at least not in HDFS. 
Why introduce it now.

 Running rollback on new-version namenode destroys namespace
 ---

 Key: HDFS-1969
 URL: https://issues.apache.org/jira/browse/HDFS-1969
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-1969.txt, hdfs-1969.txt


 The following sequence leaves the namespace in an inconsistent/broken state:
 - format NN using 0.20 (or any prior release, probably)
 - run hdfs namenode -upgrade on 0.22. ^C the NN once it comes up.
 - run hdfs namenode -rollback on 0.22  (this should fail but doesn't!)
 This leaves the name directory in a state such that the version file claims 
 it's an 0.20 namespace, but the fsimage is in 0.22 format. It then crashes 
 when trying to start up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-1987) HDFS-1073: Test for 2NN downloading image is not running

HDFS-1073: Test for 2NN downloading image is not running


 Key: HDFS-1987
 URL: https://issues.apache.org/jira/browse/HDFS-1987
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1987) HDFS-1073: Test for 2NN downloading image is not running


 [ 
https://issues.apache.org/jira/browse/HDFS-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1987:
--

  Component/s: name-node
  Description: TestCheckpoint.testSecondaryImageDownload was introduced 
at some point but was never called from anywhere, so it wasn't actually 
running. This JIRA is to fix it up to work on trunk and actually run as part of 
the test suite.
Affects Version/s: Edit log branch (HDFS-1073)
Fix Version/s: Edit log branch (HDFS-1073)

 HDFS-1073: Test for 2NN downloading image is not running
 

 Key: HDFS-1987
 URL: https://issues.apache.org/jira/browse/HDFS-1987
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 TestCheckpoint.testSecondaryImageDownload was introduced at some point but 
 was never called from anywhere, so it wasn't actually running. This JIRA is 
 to fix it up to work on trunk and actually run as part of the test suite.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1969) Running rollback on new-version namenode destroys namespace

[
https://issues.apache.org/jira/browse/HDFS-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038351#comment-13038351
]

Todd Lipcon commented on HDFS-1969:
---

bq. I actually don't see where it is triggered.

The test code uses this code in order to create VERSION files that look like
they came from older versions. We could copy-paste some new code in to generate
VERSION files, but that's a bit messy too.

bq. The precondition sounds good. But it would be better to just convert it to
assert. I don't think we've used com.google.* before at least not in HDFS. Why
introduce it now.

There was a vote on the mailing list a few months back and people said they
were OK with including Guava (com.google.*). The advantage of Preconditions
over assert is that Preconditions will always run regardless of JVM options. In
areas that aren't performance-sensitive, this is preferred.

Running rollback on new-version namenode destroys namespace
---

Key: HDFS-1969
URL: https://issues.apache.org/jira/browse/HDFS-1969
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
Fix For: 0.22.0

Attachments: hdfs-1969.txt, hdfs-1969.txt

The following sequence leaves the namespace in an inconsistent/broken state:
- format NN using 0.20 (or any prior release, probably)
- run hdfs namenode -upgrade on 0.22. ^C the NN once it comes up.
- run hdfs namenode -rollback on 0.22 (this should fail but doesn't!)
This leaves the name directory in a state such that the version file claims
it's an 0.20 namespace, but the fsimage is in 0.22 format. It then crashes
when trying to start up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1987) HDFS-1073: Test for 2NN downloading image is not running


 [ 
https://issues.apache.org/jira/browse/HDFS-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1987:
--

Attachment: hdfs-1987.txt

Attached patch makes this test case actually run, and fixes it up to work with 
the new edits log layout.

 HDFS-1073: Test for 2NN downloading image is not running
 

 Key: HDFS-1987
 URL: https://issues.apache.org/jira/browse/HDFS-1987
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1987.txt


 TestCheckpoint.testSecondaryImageDownload was introduced at some point but 
 was never called from anywhere, so it wasn't actually running. This JIRA is 
 to fix it up to work on trunk and actually run as part of the test suite.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1984) HDFS-1073: Enable multiple checkpointers to run simultaneously