[jira] [Commented] (HDFS-4837) Allow DFSAdmin to run when HDFS is not the default file system

2013-05-21 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662772#comment-13662772
 ] 

Ivan Mitic commented on HDFS-4837:
--

+1 on the proposal

I reviewed the patch, approach looks good to me with a few comments/questions 
below:
1. One thing worth checking is what this means for HA enabled clusters, when 
you have two configured namenodes
2. Should we also query for DFS in DFSAdmin#setBalancerBandwidth()?
3. It would be good to add a unittest for the new functionality. TestDFSShell 
looks like a good place since it already has a test case for DFSAdmin (see 
testInvalidShell).



> Allow DFSAdmin to run when HDFS is not the default file system
> --
>
> Key: HDFS-4837
> URL: https://issues.apache.org/jira/browse/HDFS-4837
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HDFS-4837.patch
>
>
> When Hadoop is running a different default file system than HDFS, but still 
> have HDFS namenode running, we are unable to run dfsadmin commands.
> I suggest that DFSAdmin use the same mechanism as NameNode does today to get 
> its address: look at dfs.namenode.rpc-address, and if not set fallback on 
> getting it from the default file system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4835) Port trunk WebHDFS changes to branch-0.23

2013-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662956#comment-13662956
 ] 

Hudson commented on HDFS-4835:
--

Integrated in Hadoop-Hdfs-0.23-Build #614 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/614/])
HDFS-4835. Port trunk WebHDFS changes to branch-0.23. Contributed by Robert 
Parker. (Revision 1484574)

 Result = SUCCESS
kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1484574
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HftpFileSystem.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenRenewer.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHftpFileSystem.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java


> Port trunk WebHDFS changes to branch-0.23 
> --
>
> Key: HDFS-4835
> URL: https://issues.apache.org/jira/browse/HDFS-4835
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 0.23.7
>Reporter: Robert Parker
>Assignee: Robert Parker
>Priority: Critical
> Fix For: 0.23.8
>
> Attachments: HDFS-4835v1.patch, HDFS-4835v2.patch
>
>
> HADOOP-9549 and HDFS-4805 made changes to make the WebHDFS and 
> DelegationTokenRenewer to make it more robust for secure clusters.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3875) Issue handling checksum errors in write pipeline

2013-05-21 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-3875:
-

   Resolution: Fixed
Fix Version/s: 0.23.8
   2.0.5-beta
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch-2 and branch-0.23.
Thanks everybody for the reviews.

> Issue handling checksum errors in write pipeline
> 
>
> Key: HDFS-3875
> URL: https://issues.apache.org/jira/browse/HDFS-3875
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 3.0.0, 2.0.5-beta, 0.23.8
>
> Attachments: hdfs-3875.branch-0.23.no.test.patch.txt, 
> hdfs-3875.branch-0.23.patch.txt, hdfs-3875.branch-0.23.patch.txt, 
> hdfs-3875.branch-0.23.with.test.patch.txt, hdfs-3875.branch-2.patch.txt, 
> hdfs-3875.patch.txt, hdfs-3875.patch.txt, hdfs-3875.patch.txt, 
> hdfs-3875.trunk.no.test.patch.txt, hdfs-3875.trunk.no.test.patch.txt, 
> hdfs-3875.trunk.patch.txt, hdfs-3875.trunk.patch.txt, 
> hdfs-3875.trunk.with.test.patch.txt, hdfs-3875.trunk.with.test.patch.txt, 
> hdfs-3875-wip.patch
>
>
> We saw this issue with one block in a large test cluster. The client is 
> storing the data with replication level 2, and we saw the following:
> - the second node in the pipeline detects a checksum error on the data it 
> received from the first node. We don't know if the client sent a bad 
> checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
> - this caused the second node to get kicked out of the pipeline, since it 
> threw an exception. The pipeline started up again with only one replica (the 
> first node in the pipeline)
> - this replica was later determined to be corrupt by the block scanner, and 
> unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline

2013-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662982#comment-13662982
 ] 

Hudson commented on HDFS-3875:
--

Integrated in Hadoop-trunk-Commit #3771 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3771/])
HDFS-3875. Issue handling checksum errors in write pipeline. Contributed by 
Kihwal Lee. (Revision 1484808)

 Result = SUCCESS
kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1484808
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClientFaultInjector.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestCrcCorruption.java


> Issue handling checksum errors in write pipeline
> 
>
> Key: HDFS-3875
> URL: https://issues.apache.org/jira/browse/HDFS-3875
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 3.0.0, 2.0.5-beta, 0.23.8
>
> Attachments: hdfs-3875.branch-0.23.no.test.patch.txt, 
> hdfs-3875.branch-0.23.patch.txt, hdfs-3875.branch-0.23.patch.txt, 
> hdfs-3875.branch-0.23.with.test.patch.txt, hdfs-3875.branch-2.patch.txt, 
> hdfs-3875.patch.txt, hdfs-3875.patch.txt, hdfs-3875.patch.txt, 
> hdfs-3875.trunk.no.test.patch.txt, hdfs-3875.trunk.no.test.patch.txt, 
> hdfs-3875.trunk.patch.txt, hdfs-3875.trunk.patch.txt, 
> hdfs-3875.trunk.with.test.patch.txt, hdfs-3875.trunk.with.test.patch.txt, 
> hdfs-3875-wip.patch
>
>
> We saw this issue with one block in a large test cluster. The client is 
> storing the data with replication level 2, and we saw the following:
> - the second node in the pipeline detects a checksum error on the data it 
> received from the first node. We don't know if the client sent a bad 
> checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
> - this caused the second node to get kicked out of the pipeline, since it 
> threw an exception. The pipeline started up again with only one replica (the 
> first node in the pipeline)
> - this replica was later determined to be corrupt by the block scanner, and 
> unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4298) StorageRetentionManager spews warnings when used with QJM

2013-05-21 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4298:
-

Attachment: HDFS-4298.patch

New patch to fix the findbugs warning and test failures.

> StorageRetentionManager spews warnings when used with QJM
> -
>
> Key: HDFS-4298
> URL: https://issues.apache.org/jira/browse/HDFS-4298
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
> Attachments: HDFS-4298.patch, HDFS-4298.patch
>
>
> When the NN is configured with a QJM, we see the following warning message 
> every time a checkpoint is made or uploaded:
> 12/12/10 16:07:52 WARN namenode.FSEditLog: Unable to determine input streams 
> from QJM to [127.0.0.1:13001, 127.0.0.1:13002, 127.0.0.1:13003]. Skipping.
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
> 127.0.0.1:13002: Asked for firstTxId 114837 which is in the middle of file 
> /tmp/jn-2/myjournal/current/edits_0095185-0114846
> ...
> This is because, since HDFS-2946, the NN calls {{selectInputStreams}} to 
> determine the number of log segments and put a cap on the number. This API 
> throws an exception in the case of QJM if the argument falls in the middle of 
> an edit log boundary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4298) StorageRetentionManager spews warnings when used with QJM

2013-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663091#comment-13663091
 ] 

Hadoop QA commented on HDFS-4298:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12584012/HDFS-4298.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

  
org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
  
org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4420//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4420//console

This message is automatically generated.

> StorageRetentionManager spews warnings when used with QJM
> -
>
> Key: HDFS-4298
> URL: https://issues.apache.org/jira/browse/HDFS-4298
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
> Attachments: HDFS-4298.patch, HDFS-4298.patch
>
>
> When the NN is configured with a QJM, we see the following warning message 
> every time a checkpoint is made or uploaded:
> 12/12/10 16:07:52 WARN namenode.FSEditLog: Unable to determine input streams 
> from QJM to [127.0.0.1:13001, 127.0.0.1:13002, 127.0.0.1:13003]. Skipping.
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
> 127.0.0.1:13002: Asked for firstTxId 114837 which is in the middle of file 
> /tmp/jn-2/myjournal/current/edits_0095185-0114846
> ...
> This is because, since HDFS-2946, the NN calls {{selectInputStreams}} to 
> determine the number of log segments and put a cap on the number. This API 
> throws an exception in the case of QJM if the argument falls in the middle of 
> an edit log boundary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4839) add NativeIO#mkdirs, that provides an error message on failure

2013-05-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663159#comment-13663159
 ] 

Chris Nauroth commented on HDFS-4839:
-

Recent patches for native integration (i.e. Colin's native rename in HDFS-4428 
and Ivan's file permission work in HADOOP-9413) have done a great job of 
encapsulating the JNI call, and fallback when native library is not loaded, 
behind a single Java method.  Doing this sets us up for a very easy transition 
to the new Java 7 file APIs whenever we migrate.  At that point, we can delete 
our own native code paths.  Hopefully, this will bring the maintenance concerns 
under control in the long run.

> add NativeIO#mkdirs, that provides an error message on failure
> --
>
> Key: HDFS-4839
> URL: https://issues.apache.org/jira/browse/HDFS-4839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.5-beta
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> It would be nice to have a variant of mkdirs that provided an error message 
> explaining why it failed.  This would make it easier to debug certain failing 
> unit tests that rely on mkdir / mkdirs-- the ChecksumFilesystem tests, for 
> example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4839) add NativeIO#mkdirs, that provides an error message on failure

2013-05-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663166#comment-13663166
 ] 

Colin Patrick McCabe commented on HDFS-4839:


Ivan, you bring up a valid point.  Let's create JIRAs to use the JDK7 APIs when 
they become available to us.  I can think of at least three cases where JDK7 
will allow us to reduce the amount of native code or shell calls:

* symlinks / hardlinks, which JDK7 has support for but JDK6 does not (we 
currently run shell code to create them in the HDFS upgrade code)
* mkdir / mkdirs with an appropriate error message
* rename with an error message

That way, in a few years when the Hadoop PMC makes the decision to drop support 
for JDK6, we will be able to switch over to a pure Java solution easily.  In 
the meantime, I think we ought to provide the error message on failure, as a 
service to our users (and developers!)

> add NativeIO#mkdirs, that provides an error message on failure
> --
>
> Key: HDFS-4839
> URL: https://issues.apache.org/jira/browse/HDFS-4839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.5-beta
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> It would be nice to have a variant of mkdirs that provided an error message 
> explaining why it failed.  This would make it easier to debug certain failing 
> unit tests that rely on mkdir / mkdirs-- the ChecksumFilesystem tests, for 
> example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4298) StorageRetentionManager spews warnings when used with QJM

2013-05-21 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663251#comment-13663251
 ] 

Aaron T. Myers commented on HDFS-4298:
--

TestBookKeeperHACheckpoints is currently failing on trunk, and I think that the 
TestListCorruptFileBlocks failure was spurious. The latter doesn't fail for me 
when run on my local box, and the test failure was because of an NPE in the 
BlockManager replication monitor, which has nothing to do with this patch.

Please review.

> StorageRetentionManager spews warnings when used with QJM
> -
>
> Key: HDFS-4298
> URL: https://issues.apache.org/jira/browse/HDFS-4298
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
> Attachments: HDFS-4298.patch, HDFS-4298.patch
>
>
> When the NN is configured with a QJM, we see the following warning message 
> every time a checkpoint is made or uploaded:
> 12/12/10 16:07:52 WARN namenode.FSEditLog: Unable to determine input streams 
> from QJM to [127.0.0.1:13001, 127.0.0.1:13002, 127.0.0.1:13003]. Skipping.
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
> 127.0.0.1:13002: Asked for firstTxId 114837 which is in the middle of file 
> /tmp/jn-2/myjournal/current/edits_0095185-0114846
> ...
> This is because, since HDFS-2946, the NN calls {{selectInputStreams}} to 
> determine the number of log segments and put a cap on the number. This API 
> throws an exception in the case of QJM if the argument falls in the middle of 
> an edit log boundary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4677) Editlog should support synchronous writes

2013-05-21 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-4677:


Attachment: HDFS-4677.3.patch
HDFS-4677.3.patch

{quote}
One thing that bothers me in the latest patch is now having two constructors 
for FileJournalManager and EditLogFileOutputStream, one with conf and one 
without. Given that the one with conf is the right choice for most cases, it 
might make sense to lose the other one. However, making this change would 
further increase the scope of this simple Jira, so I’m deferring this question 
to the community.
{quote}

At first, I was going to suggest deferring removal of the old constructors to a 
separate cleanup jira.  Then, I started investigating what it would take to do 
it right now.  In the course of investigating, I ended up doing it.  :-)  It 
turned out that it wasn't a whole lot of extra work.

Rather than throw that work away, I'm attaching version 3 of the patch to share 
those changes.  I ran the changed tests for verification.  Ivan, how does this 
look to you?

> Editlog should support synchronous writes
> -
>
> Key: HDFS-4677
> URL: https://issues.apache.org/jira/browse/HDFS-4677
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1-win
>Reporter: Ivan Mitic
>Assignee: Ivan Mitic
> Attachments: HDFS-4677.2.patch, HDFS-4677.3.patch, HDFS-4677.3.patch, 
> HDFS-4677.patch
>
>
> In the current implementation, NameNode editlog performs syncs to the 
> persistent storage using the {{FileChannel#force}} Java APIs. This API is 
> documented to be slower compared to an alternative where {{RandomAccessFile}} 
> is opened with "rws" flags (synchronous writes). 
> We instrumented {{FileChannel#force}} on Windows and it some 
> software/hardware configurations it can perform significantly slower than the 
> “rws” alternative.
> In terms of the Windows APIs, FileChannel#force internally calls 
> [FlushFileBuffers|http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx]
>  while RandomAccessFile (“rws”) opens the file with the 
> [FILE_FLAG_WRITE_THROUGH flag|http://support.microsoft.com/kb/99794]. 
> With this Jira I'd like to introduce a flag that provide means to configure 
> NameNode to use synchronous writes. There is a catch though, the behavior of 
> the "rws" flags is platform and hardware specific and might not provide the 
> same level of guarantees as {{FileChannel#force}} w.r.t. flushing the on-disk 
> cache. This is an expert level setting, and it should be documented as such.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4677) Editlog should support synchronous writes

2013-05-21 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-4677:


Attachment: (was: HDFS-4677.3.patch)

> Editlog should support synchronous writes
> -
>
> Key: HDFS-4677
> URL: https://issues.apache.org/jira/browse/HDFS-4677
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1-win
>Reporter: Ivan Mitic
>Assignee: Ivan Mitic
> Attachments: HDFS-4677.2.patch, HDFS-4677.3.patch, HDFS-4677.patch
>
>
> In the current implementation, NameNode editlog performs syncs to the 
> persistent storage using the {{FileChannel#force}} Java APIs. This API is 
> documented to be slower compared to an alternative where {{RandomAccessFile}} 
> is opened with "rws" flags (synchronous writes). 
> We instrumented {{FileChannel#force}} on Windows and it some 
> software/hardware configurations it can perform significantly slower than the 
> “rws” alternative.
> In terms of the Windows APIs, FileChannel#force internally calls 
> [FlushFileBuffers|http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx]
>  while RandomAccessFile (“rws”) opens the file with the 
> [FILE_FLAG_WRITE_THROUGH flag|http://support.microsoft.com/kb/99794]. 
> With this Jira I'd like to introduce a flag that provide means to configure 
> NameNode to use synchronous writes. There is a catch though, the behavior of 
> the "rws" flags is platform and hardware specific and might not provide the 
> same level of guarantees as {{FileChannel#force}} w.r.t. flushing the on-disk 
> cache. This is an expert level setting, and it should be documented as such.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-05-21 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-4832:
---

Summary: Namenode doesn't change the number of missing blocks in safemode 
when DNs rejoin or leave  (was: Namenode doesn't change the number of missing 
blocks in safemode when DNs rejoin)

> Namenode doesn't change the number of missing blocks in safemode when DNs 
> rejoin or leave
> -
>
> Key: HDFS-4832
> URL: https://issues.apache.org/jira/browse/HDFS-4832
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
>Priority: Critical
> Attachments: HDFS-4832.patch
>
>
> Courtesy Karri VRK Reddy!
> {quote}
> 1. Namenode lost datanodes causing missing blocks
> 2. Namenode was put in safe mode
> 3. Datanode restarted on dead nodes 
> 4. Waited for lots of time for the NN UI to reflect the recovered blocks.
> 5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
> {quote}
> I was able to replicate this on 0.23 and trunk. I set 
> dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
> "lost" datanode.
> Without the NN updating this list of missing blocks, the grid admins will not 
> know when to take the cluster out of safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-05-21 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-4832:
---

Description: 
Courtesy Karri VRK Reddy!
{quote}
1. Namenode lost datanodes causing missing blocks
2. Namenode was put in safe mode
3. Datanode restarted on dead nodes 
4. Waited for lots of time for the NN UI to reflect the recovered blocks.
5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
{quote}

I was able to replicate this on 0.23 and trunk. I set 
dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
"lost" datanode. The opposite case also has problems (i.e. Datanode failing 
when NN is in safemode, doesn't lead to a missing blocks message)

Without the NN updating this list of missing blocks, the grid admins will not 
know when to take the cluster out of safemode.

  was:
Courtesy Karri VRK Reddy!
{quote}
1. Namenode lost datanodes causing missing blocks
2. Namenode was put in safe mode
3. Datanode restarted on dead nodes 
4. Waited for lots of time for the NN UI to reflect the recovered blocks.
5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
{quote}

I was able to replicate this on 0.23 and trunk. I set 
dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
"lost" datanode.

Without the NN updating this list of missing blocks, the grid admins will not 
know when to take the cluster out of safemode.


> Namenode doesn't change the number of missing blocks in safemode when DNs 
> rejoin or leave
> -
>
> Key: HDFS-4832
> URL: https://issues.apache.org/jira/browse/HDFS-4832
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
>Priority: Critical
> Attachments: HDFS-4832.patch
>
>
> Courtesy Karri VRK Reddy!
> {quote}
> 1. Namenode lost datanodes causing missing blocks
> 2. Namenode was put in safe mode
> 3. Datanode restarted on dead nodes 
> 4. Waited for lots of time for the NN UI to reflect the recovered blocks.
> 5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
> {quote}
> I was able to replicate this on 0.23 and trunk. I set 
> dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
> "lost" datanode. The opposite case also has problems (i.e. Datanode failing 
> when NN is in safemode, doesn't lead to a missing blocks message)
> Without the NN updating this list of missing blocks, the grid admins will not 
> know when to take the cluster out of safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4677) Editlog should support synchronous writes

2013-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663414#comment-13663414
 ] 

Hadoop QA commented on HDFS-4677:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12584057/HDFS-4677.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4421//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4421//console

This message is automatically generated.

> Editlog should support synchronous writes
> -
>
> Key: HDFS-4677
> URL: https://issues.apache.org/jira/browse/HDFS-4677
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1-win
>Reporter: Ivan Mitic
>Assignee: Ivan Mitic
> Attachments: HDFS-4677.2.patch, HDFS-4677.3.patch, HDFS-4677.patch
>
>
> In the current implementation, NameNode editlog performs syncs to the 
> persistent storage using the {{FileChannel#force}} Java APIs. This API is 
> documented to be slower compared to an alternative where {{RandomAccessFile}} 
> is opened with "rws" flags (synchronous writes). 
> We instrumented {{FileChannel#force}} on Windows and it some 
> software/hardware configurations it can perform significantly slower than the 
> “rws” alternative.
> In terms of the Windows APIs, FileChannel#force internally calls 
> [FlushFileBuffers|http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx]
>  while RandomAccessFile (“rws”) opens the file with the 
> [FILE_FLAG_WRITE_THROUGH flag|http://support.microsoft.com/kb/99794]. 
> With this Jira I'd like to introduce a flag that provide means to configure 
> NameNode to use synchronous writes. There is a catch though, the behavior of 
> the "rws" flags is platform and hardware specific and might not provide the 
> same level of guarantees as {{FileChannel#force}} w.r.t. flushing the on-disk 
> cache. This is an expert level setting, and it should be documented as such.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4677) Editlog should support synchronous writes

2013-05-21 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663651#comment-13663651
 ] 

Ivan Mitic commented on HDFS-4677:
--

Awesome, big thanks Chris! Looks good, +1. Let me prepare the branch-1-win 
patch.

Quick question, should this go to branch-2 as well, given that there was a bit 
of refactoring going on? This would make things easier for future backports. I 
just checked, and the patch is almost completely compatible. 



> Editlog should support synchronous writes
> -
>
> Key: HDFS-4677
> URL: https://issues.apache.org/jira/browse/HDFS-4677
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1-win
>Reporter: Ivan Mitic
>Assignee: Ivan Mitic
> Attachments: HDFS-4677.2.patch, HDFS-4677.3.patch, HDFS-4677.patch
>
>
> In the current implementation, NameNode editlog performs syncs to the 
> persistent storage using the {{FileChannel#force}} Java APIs. This API is 
> documented to be slower compared to an alternative where {{RandomAccessFile}} 
> is opened with "rws" flags (synchronous writes). 
> We instrumented {{FileChannel#force}} on Windows and it some 
> software/hardware configurations it can perform significantly slower than the 
> “rws” alternative.
> In terms of the Windows APIs, FileChannel#force internally calls 
> [FlushFileBuffers|http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx]
>  while RandomAccessFile (“rws”) opens the file with the 
> [FILE_FLAG_WRITE_THROUGH flag|http://support.microsoft.com/kb/99794]. 
> With this Jira I'd like to introduce a flag that provide means to configure 
> NameNode to use synchronous writes. There is a catch though, the behavior of 
> the "rws" flags is platform and hardware specific and might not provide the 
> same level of guarantees as {{FileChannel#force}} w.r.t. flushing the on-disk 
> cache. This is an expert level setting, and it should be documented as such.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4677) Editlog should support synchronous writes

2013-05-21 Thread Ivan Mitic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Mitic updated HDFS-4677:
-

Attachment: HDFS-4677.branch-1-win.patch

Attaching the branch-1-win compatible patch. 


> Editlog should support synchronous writes
> -
>
> Key: HDFS-4677
> URL: https://issues.apache.org/jira/browse/HDFS-4677
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1-win
>Reporter: Ivan Mitic
>Assignee: Ivan Mitic
> Attachments: HDFS-4677.2.patch, HDFS-4677.3.patch, 
> HDFS-4677.branch-1-win.patch, HDFS-4677.patch
>
>
> In the current implementation, NameNode editlog performs syncs to the 
> persistent storage using the {{FileChannel#force}} Java APIs. This API is 
> documented to be slower compared to an alternative where {{RandomAccessFile}} 
> is opened with "rws" flags (synchronous writes). 
> We instrumented {{FileChannel#force}} on Windows and it some 
> software/hardware configurations it can perform significantly slower than the 
> “rws” alternative.
> In terms of the Windows APIs, FileChannel#force internally calls 
> [FlushFileBuffers|http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx]
>  while RandomAccessFile (“rws”) opens the file with the 
> [FILE_FLAG_WRITE_THROUGH flag|http://support.microsoft.com/kb/99794]. 
> With this Jira I'd like to introduce a flag that provide means to configure 
> NameNode to use synchronous writes. There is a catch though, the behavior of 
> the "rws" flags is platform and hardware specific and might not provide the 
> same level of guarantees as {{FileChannel#force}} w.r.t. flushing the on-disk 
> cache. This is an expert level setting, and it should be documented as such.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4677) Editlog should support synchronous writes

2013-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663661#comment-13663661
 ] 

Hadoop QA commented on HDFS-4677:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12584219/HDFS-4677.branch-1-win.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4422//console

This message is automatically generated.

> Editlog should support synchronous writes
> -
>
> Key: HDFS-4677
> URL: https://issues.apache.org/jira/browse/HDFS-4677
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1-win
>Reporter: Ivan Mitic
>Assignee: Ivan Mitic
> Attachments: HDFS-4677.2.patch, HDFS-4677.3.patch, 
> HDFS-4677.branch-1-win.patch, HDFS-4677.patch
>
>
> In the current implementation, NameNode editlog performs syncs to the 
> persistent storage using the {{FileChannel#force}} Java APIs. This API is 
> documented to be slower compared to an alternative where {{RandomAccessFile}} 
> is opened with "rws" flags (synchronous writes). 
> We instrumented {{FileChannel#force}} on Windows and it some 
> software/hardware configurations it can perform significantly slower than the 
> “rws” alternative.
> In terms of the Windows APIs, FileChannel#force internally calls 
> [FlushFileBuffers|http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx]
>  while RandomAccessFile (“rws”) opens the file with the 
> [FILE_FLAG_WRITE_THROUGH flag|http://support.microsoft.com/kb/99794]. 
> With this Jira I'd like to introduce a flag that provide means to configure 
> NameNode to use synchronous writes. There is a catch though, the behavior of 
> the "rws" flags is platform and hardware specific and might not provide the 
> same level of guarantees as {{FileChannel#force}} w.r.t. flushing the on-disk 
> cache. This is an expert level setting, and it should be documented as such.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4677) Editlog should support synchronous writes

2013-05-21 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663689#comment-13663689
 ] 

Ivan Mitic commented on HDFS-4677:
--

bq. -1 overall. Here are the results of testing the latest attachment 
This is expected, Jenkins tried to apply the branch-1 patch to trunk and 
failed. Trunk compatible patch (HDFS-4677.3.patch) already received a +1 from 
Jenkins.

> Editlog should support synchronous writes
> -
>
> Key: HDFS-4677
> URL: https://issues.apache.org/jira/browse/HDFS-4677
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1-win
>Reporter: Ivan Mitic
>Assignee: Ivan Mitic
> Attachments: HDFS-4677.2.patch, HDFS-4677.3.patch, 
> HDFS-4677.branch-1-win.patch, HDFS-4677.patch
>
>
> In the current implementation, NameNode editlog performs syncs to the 
> persistent storage using the {{FileChannel#force}} Java APIs. This API is 
> documented to be slower compared to an alternative where {{RandomAccessFile}} 
> is opened with "rws" flags (synchronous writes). 
> We instrumented {{FileChannel#force}} on Windows and it some 
> software/hardware configurations it can perform significantly slower than the 
> “rws” alternative.
> In terms of the Windows APIs, FileChannel#force internally calls 
> [FlushFileBuffers|http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx]
>  while RandomAccessFile (“rws”) opens the file with the 
> [FILE_FLAG_WRITE_THROUGH flag|http://support.microsoft.com/kb/99794]. 
> With this Jira I'd like to introduce a flag that provide means to configure 
> NameNode to use synchronous writes. There is a catch though, the behavior of 
> the "rws" flags is platform and hardware specific and might not provide the 
> same level of guarantees as {{FileChannel#force}} w.r.t. flushing the on-disk 
> cache. This is an expert level setting, and it should be documented as such.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4839) add NativeIO#mkdirs, that provides an error message on failure

2013-05-21 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663711#comment-13663711
 ] 

Ivan Mitic commented on HDFS-4839:
--

Thanks Colin and Chris.

Bq. Let's create JIRAs to use the JDK7 APIs when they become available to us.
Good idea! I created HADOOP-9590 and documented all problems we run into w.r.t. 
file operations on JDK6. Feel free to add to the Jira if I missed something.


> add NativeIO#mkdirs, that provides an error message on failure
> --
>
> Key: HDFS-4839
> URL: https://issues.apache.org/jira/browse/HDFS-4839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.5-beta
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> It would be nice to have a variant of mkdirs that provided an error message 
> explaining why it failed.  This would make it easier to debug certain failing 
> unit tests that rely on mkdir / mkdirs-- the ChecksumFilesystem tests, for 
> example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira