date:20141105


[ 
https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197810#comment-14197810
 ] 

Hudson commented on HDFS-7218:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6449 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6449/])
HDFS-7218. FSNamesystem ACL operations should write to audit log on failure. 
(clamb via yliu) (yliu: rev 73e601259fed0646f115b09112995b51ffef3468)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 FSNamesystem ACL operations should write to audit log on failure
 

 Key: HDFS-7218
 URL: https://issues.apache.org/jira/browse/HDFS-7218
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, 
 HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch


 Various Acl methods in FSNamesystem do not write to the audit log when the 
 operation is not successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.

[
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth updated HDFS-7359:

Attachment: HDFS-7359.1.patch

Here is a patch that fixes the bug by catching the error in
{{GetJournalEditServlet}}. I considered just removing the addition of the
SecondaryNameNode principal, since I've never heard of this usage in practice.
However, I suppose it would be considered a backwards-incompatible change if
someone out there was running a non-HA cluster and just had chosen to offload
edits to the JournalNodes for consumption by the SecondaryNameNode. Catching
it is probably the safer change. {{TestSecureNNWithQJM}} is a new test suite
that covers usage of QJM in a secured cluster. While I was working on this, I
also spotted a typo in {{TestNNWithQJM}}, which I'm correcting in this patch.

NameNode in secured HA cluster fails to start if
dfs.namenode.secondary.http-address cannot be interpreted as a network
address.

Key: HDFS-7359
URL: https://issues.apache.org/jira/browse/HDFS-7359
Project: Hadoop HDFS
Issue Type: Bug
Components: journal-node
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Attachments: HDFS-7359.1.patch

In a secured cluster, the JournalNode validates that the caller is one of a
valid set of principals. One of the principals considered is that of the
SecondaryNameNode. This involves checking
{{dfs.namenode.secondary.http-address}} and trying to interpret it as a
network address. If a user has specified a value for this property that
cannot be interpeted as a network address, such as null, then this causes
the JournalNode operation to fail, and ultimately the NameNode cannot start.
The JournalNode should not have a hard dependency on
{{dfs.namenode.secondary.http-address}} like this. It is not typical to run
a SecondaryNameNode in combination with JournalNodes. There is even a check
in SecondaryNameNode that aborts if HA is enabled.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.


 [ 
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7359:

Status: Patch Available  (was: Open)

 NameNode in secured HA cluster fails to start if 
 dfs.namenode.secondary.http-address cannot be interpreted as a network 
 address.
 

 Key: HDFS-7359
 URL: https://issues.apache.org/jira/browse/HDFS-7359
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7359.1.patch


 In a secured cluster, the JournalNode validates that the caller is one of a 
 valid set of principals.  One of the principals considered is that of the 
 SecondaryNameNode.  This involves checking 
 {{dfs.namenode.secondary.http-address}} and trying to interpret it as a 
 network address.  If a user has specified a value for this property that 
 cannot be interpeted as a network address, such as null, then this causes 
 the JournalNode operation to fail, and ultimately the NameNode cannot start.  
 The JournalNode should not have a hard dependency on 
 {{dfs.namenode.secondary.http-address}} like this.  It is not typical to run 
 a SecondaryNameNode in combination with JournalNodes.  There is even a check 
 in SecondaryNameNode that aborts if HA is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure

2014-11-05 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-7218:
-
  Resolution: Fixed
   Fix Version/s: 2.6.0
Target Version/s: 2.6.0  (was: 2.7.0)
  Status: Resolved  (was: Patch Available)

Commit to trunk, branch-2, branch-2.6
Thanks Charles for contribution and Chris for review.

 FSNamesystem ACL operations should write to audit log on failure
 

 Key: HDFS-7218
 URL: https://issues.apache.org/jira/browse/HDFS-7218
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, 
 HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch


 Various Acl methods in FSNamesystem do not write to the audit log when the 
 operation is not successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7333) Improve log message in Storage.tryLock()

2014-11-05 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197886#comment-14197886
 ] 

Konstantin Boudnik commented on HDFS-7333:
--

+1 patch looks good (hopefully, my expertise is sufficient for approving this?)

 Improve log message in Storage.tryLock()
 

 Key: HDFS-7333
 URL: https://issues.apache.org/jira/browse/HDFS-7333
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: logging.patch


 Confusing log message in Storage.tryLock(). It talks about namenode, while 
 this is a common part of NameNode and DataNode storage.
 The log message should include the directory path and the exception.
 Also fix the long line in tryLock().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7347) Configurable erasure coding policy for individual files and directories


[ 
https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197934#comment-14197934
 ] 

Hadoop QA commented on HDFS-7347:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12679475/HDFS-7347-20141104.patch
  against trunk revision 73068f6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestBlockStoragePolicy

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8654//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8654//console

This message is automatically generated.

 Configurable erasure coding policy for individual files and directories
 ---

 Key: HDFS-7347
 URL: https://issues.apache.org/jira/browse/HDFS-7347
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-7347-20141104.patch


 HDFS users and admins should be able to turn on and off erasure coding for 
 individual files or directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods


[ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197961#comment-14197961
 ] 

Hadoop QA commented on HDFS-7279:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679470/HDFS-7279.006.patch
  against trunk revision 73068f6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.security.ssl.TestReloadingX509TrustManager
  org.apache.hadoop.hdfs.TestFetchImage
  org.apache.hadoop.hdfs.TestRollingUpgrade

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8653//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8653//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8653//console

This message is automatically generated.

 Use netty to implement DatanodeWebHdfsMethods
 -

 Key: HDFS-7279
 URL: https://issues.apache.org/jira/browse/HDFS-7279
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, webhdfs
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, 
 HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, 
 HDFS-7279.005.patch, HDFS-7279.006.patch


 Currently the DN implements all related webhdfs functionality using jetty. As 
 the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
 and connection management, DN often suffers from long latency and OOM when 
 its webhdfs component is under sustained heavy load.
 This jira proposes to implement the webhdfs component in DN using netty, 
 which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.


[ 
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197962#comment-14197962
 ] 

Hadoop QA commented on HDFS-7359:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679492/HDFS-7359.1.patch
  against trunk revision 73e6012.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHDFS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8655//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8655//console

This message is automatically generated.

 NameNode in secured HA cluster fails to start if 
 dfs.namenode.secondary.http-address cannot be interpreted as a network 
 address.
 

 Key: HDFS-7359
 URL: https://issues.apache.org/jira/browse/HDFS-7359
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7359.1.patch


 In a secured cluster, the JournalNode validates that the caller is one of a 
 valid set of principals.  One of the principals considered is that of the 
 SecondaryNameNode.  This involves checking 
 {{dfs.namenode.secondary.http-address}} and trying to interpret it as a 
 network address.  If a user has specified a value for this property that 
 cannot be interpeted as a network address, such as null, then this causes 
 the JournalNode operation to fail, and ultimately the NameNode cannot start.  
 The JournalNode should not have a hard dependency on 
 {{dfs.namenode.secondary.http-address}} like this.  It is not typical to run 
 a SecondaryNameNode in combination with JournalNodes.  There is even a check 
 in SecondaryNameNode that aborts if HA is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures


[ 
https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198297#comment-14198297
 ] 

Hudson commented on HDFS-7334:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/734/])
HDFS-7334. Fix periodic failures of 
TestCheckpoint#testTooManyEditReplayFailures. Contributed by Charles Lamb. 
(wheat9: rev d0449bd2fd0b03765bef78b2d7952b799f06575b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
 -

 Key: HDFS-7334
 URL: https://issues.apache.org/jira/browse/HDFS-7334
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch


 TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test 
 timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7233) NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException


[ 
https://issues.apache.org/jira/browse/HDFS-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198303#comment-14198303
 ] 

Hudson commented on HDFS-7233:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/734/])
HDFS-7233. NN logs unnecessary 
org.apache.hadoop.hdfs.protocol.UnresolvedPathException. Contributed by Rushabh 
S Shah. (jing9: rev 5bd3a569f941ffcfc425a55288bec78a37a75aa1)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException
 ---

 Key: HDFS-7233
 URL: https://issues.apache.org/jira/browse/HDFS-7233
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Fix For: 2.6.0

 Attachments: HDFS-7233.patch


 Namenode logs the UnresolvedPathExceptioneven though that file exists in HDFS.
 Each time a symlink is accessed the NN will
 throw an UnresolvedPathException to have the client resolve it.  This 
 shouldn't
 be logged in the NN log and we could have really large NN logs  if we
 don't fix this since every MR job on the cluster will access this symlink and
 cause a stacktrace to be logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure


[ 
https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198305#comment-14198305
 ] 

Hudson commented on HDFS-7218:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/734/])
HDFS-7218. FSNamesystem ACL operations should write to audit log on failure. 
(clamb via yliu) (yliu: rev 73e601259fed0646f115b09112995b51ffef3468)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 FSNamesystem ACL operations should write to audit log on failure
 

 Key: HDFS-7218
 URL: https://issues.apache.org/jira/browse/HDFS-7218
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, 
 HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch


 Various Acl methods in FSNamesystem do not write to the audit log when the 
 operation is not successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs


[ 
https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198307#comment-14198307
 ] 

Hudson commented on HDFS-7356:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/734/])
HDFS-7356. Use DirectoryListing.hasMore() directly in nfs. Contributed by Li 
Lu. (jing9: rev 27f106e2261d0dfdb04e3d08dfd84ca4fdfad244)
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Use DirectoryListing.hasMore() directly in nfs
 --

 Key: HDFS-7356
 URL: https://issues.apache.org/jira/browse/HDFS-7356
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Reporter: Haohui Mai
Assignee: Li Lu
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7356-110414.patch


 In NFS the following code path can be simplified using 
 {{DirectoryListing.hasMore()}}:
 {code}
 boolean eof = (n  fstatus.length) ? false : (dlisting
 .getRemainingEntries() == 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.


[ 
https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198304#comment-14198304
 ] 

Hudson commented on HDFS-7355:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/734/])
HDFS-7355. TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails 
on Windows, because we cannot deny access to the file owner. Contributed by 
Chris Nauroth. (wheat9: rev 99d710348a20ff99044207df4b92ab3bff31bd69)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on 
 Windows, because we cannot deny access to the file owner.
 

 Key: HDFS-7355
 URL: https://issues.apache.org/jira/browse/HDFS-7355
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Fix For: 2.6.0

 Attachments: HDFS-7355.1.patch


 {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on 
 Windows.  The test attempts to simulate volume failure by denying permissions 
 to data volume directories.  This doesn't work on Windows, because Windows 
 allows the file owner access regardless of the permission settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent


[ 
https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198308#comment-14198308
 ] 

Hudson commented on HDFS-7340:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/734/])
HDFS-7340. Make rollingUpgrade start/finalize idempotent. Contributed by Jing 
Zhao. (jing9: rev 3dfd6e68fe5028fe3766ae5056dc175c38cc97e1)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 make rollingUpgrade start/finalize idempotent
 -

 Key: HDFS-7340
 URL: https://issues.apache.org/jira/browse/HDFS-7340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Fix For: 2.6.0

 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch


 I was running this on a HA cluster with 
 dfs.client.test.drop.namenode.response.number set to 1. So the first request 
 goes through but the response is dropped. Which then causes another request 
 which fails and says a request is already in progress. We should add retry 
 cache support for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs


[ 
https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198405#comment-14198405
 ] 

Hudson commented on HDFS-7356:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/])
HDFS-7356. Use DirectoryListing.hasMore() directly in nfs. Contributed by Li 
Lu. (jing9: rev 27f106e2261d0dfdb04e3d08dfd84ca4fdfad244)
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Use DirectoryListing.hasMore() directly in nfs
 --

 Key: HDFS-7356
 URL: https://issues.apache.org/jira/browse/HDFS-7356
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Reporter: Haohui Mai
Assignee: Li Lu
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7356-110414.patch


 In NFS the following code path can be simplified using 
 {{DirectoryListing.hasMore()}}:
 {code}
 boolean eof = (n  fstatus.length) ? false : (dlisting
 .getRemainingEntries() == 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7233) NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException


[ 
https://issues.apache.org/jira/browse/HDFS-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198401#comment-14198401
 ] 

Hudson commented on HDFS-7233:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/])
HDFS-7233. NN logs unnecessary 
org.apache.hadoop.hdfs.protocol.UnresolvedPathException. Contributed by Rushabh 
S Shah. (jing9: rev 5bd3a569f941ffcfc425a55288bec78a37a75aa1)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java


 NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException
 ---

 Key: HDFS-7233
 URL: https://issues.apache.org/jira/browse/HDFS-7233
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Fix For: 2.6.0

 Attachments: HDFS-7233.patch


 Namenode logs the UnresolvedPathExceptioneven though that file exists in HDFS.
 Each time a symlink is accessed the NN will
 throw an UnresolvedPathException to have the client resolve it.  This 
 shouldn't
 be logged in the NN log and we could have really large NN logs  if we
 don't fix this since every MR job on the cluster will access this symlink and
 cause a stacktrace to be logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.


[ 
https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198402#comment-14198402
 ] 

Hudson commented on HDFS-7355:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/])
HDFS-7355. TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails 
on Windows, because we cannot deny access to the file owner. Contributed by 
Chris Nauroth. (wheat9: rev 99d710348a20ff99044207df4b92ab3bff31bd69)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on 
 Windows, because we cannot deny access to the file owner.
 

 Key: HDFS-7355
 URL: https://issues.apache.org/jira/browse/HDFS-7355
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Fix For: 2.6.0

 Attachments: HDFS-7355.1.patch


 {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on 
 Windows.  The test attempts to simulate volume failure by denying permissions 
 to data volume directories.  This doesn't work on Windows, because Windows 
 allows the file owner access regardless of the permission settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure


[ 
https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198403#comment-14198403
 ] 

Hudson commented on HDFS-7218:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/])
HDFS-7218. FSNamesystem ACL operations should write to audit log on failure. 
(clamb via yliu) (yliu: rev 73e601259fed0646f115b09112995b51ffef3468)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java


 FSNamesystem ACL operations should write to audit log on failure
 

 Key: HDFS-7218
 URL: https://issues.apache.org/jira/browse/HDFS-7218
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, 
 HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch


 Various Acl methods in FSNamesystem do not write to the audit log when the 
 operation is not successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures


[ 
https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198395#comment-14198395
 ] 

Hudson commented on HDFS-7334:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/])
HDFS-7334. Fix periodic failures of 
TestCheckpoint#testTooManyEditReplayFailures. Contributed by Charles Lamb. 
(wheat9: rev d0449bd2fd0b03765bef78b2d7952b799f06575b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java


 Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
 -

 Key: HDFS-7334
 URL: https://issues.apache.org/jira/browse/HDFS-7334
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch


 TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test 
 timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent


[ 
https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198406#comment-14198406
 ] 

Hudson commented on HDFS-7340:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/])
HDFS-7340. Make rollingUpgrade start/finalize idempotent. Contributed by Jing 
Zhao. (jing9: rev 3dfd6e68fe5028fe3766ae5056dc175c38cc97e1)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 make rollingUpgrade start/finalize idempotent
 -

 Key: HDFS-7340
 URL: https://issues.apache.org/jira/browse/HDFS-7340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Fix For: 2.6.0

 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch


 I was running this on a HA cluster with 
 dfs.client.test.drop.namenode.response.number set to 1. So the first request 
 goes through but the response is dropped. Which then causes another request 
 which fails and says a request is already in progress. We should add retry 
 cache support for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7233) NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException

2014-11-05 Thread Rushabh S Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198474#comment-14198474
 ] 

Rushabh S Shah commented on HDFS-7233:
--

Thanks [~jingzhao] for committing the patch.

 NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException
 ---

 Key: HDFS-7233
 URL: https://issues.apache.org/jira/browse/HDFS-7233
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Fix For: 2.6.0

 Attachments: HDFS-7233.patch


 Namenode logs the UnresolvedPathExceptioneven though that file exists in HDFS.
 Each time a symlink is accessed the NN will
 throw an UnresolvedPathException to have the client resolve it.  This 
 shouldn't
 be logged in the NN log and we could have really large NN logs  if we
 don't fix this since every MR job on the cluster will access this symlink and
 cause a stacktrace to be logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs


[ 
https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198496#comment-14198496
 ] 

Hudson commented on HDFS-7356:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/])
HDFS-7356. Use DirectoryListing.hasMore() directly in nfs. Contributed by Li 
Lu. (jing9: rev 27f106e2261d0dfdb04e3d08dfd84ca4fdfad244)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java


 Use DirectoryListing.hasMore() directly in nfs
 --

 Key: HDFS-7356
 URL: https://issues.apache.org/jira/browse/HDFS-7356
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Reporter: Haohui Mai
Assignee: Li Lu
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7356-110414.patch


 In NFS the following code path can be simplified using 
 {{DirectoryListing.hasMore()}}:
 {code}
 boolean eof = (n  fstatus.length) ? false : (dlisting
 .getRemainingEntries() == 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure


[ 
https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198494#comment-14198494
 ] 

Hudson commented on HDFS-7218:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/])
HDFS-7218. FSNamesystem ACL operations should write to audit log on failure. 
(clamb via yliu) (yliu: rev 73e601259fed0646f115b09112995b51ffef3468)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 FSNamesystem ACL operations should write to audit log on failure
 

 Key: HDFS-7218
 URL: https://issues.apache.org/jira/browse/HDFS-7218
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, 
 HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch


 Various Acl methods in FSNamesystem do not write to the audit log when the 
 operation is not successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7233) NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException


[ 
https://issues.apache.org/jira/browse/HDFS-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198492#comment-14198492
 ] 

Hudson commented on HDFS-7233:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/])
HDFS-7233. NN logs unnecessary 
org.apache.hadoop.hdfs.protocol.UnresolvedPathException. Contributed by Rushabh 
S Shah. (jing9: rev 5bd3a569f941ffcfc425a55288bec78a37a75aa1)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java


 NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException
 ---

 Key: HDFS-7233
 URL: https://issues.apache.org/jira/browse/HDFS-7233
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Fix For: 2.6.0

 Attachments: HDFS-7233.patch


 Namenode logs the UnresolvedPathExceptioneven though that file exists in HDFS.
 Each time a symlink is accessed the NN will
 throw an UnresolvedPathException to have the client resolve it.  This 
 shouldn't
 be logged in the NN log and we could have really large NN logs  if we
 don't fix this since every MR job on the cluster will access this symlink and
 cause a stacktrace to be logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent


[ 
https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198497#comment-14198497
 ] 

Hudson commented on HDFS-7340:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/])
HDFS-7340. Make rollingUpgrade start/finalize idempotent. Contributed by Jing 
Zhao. (jing9: rev 3dfd6e68fe5028fe3766ae5056dc175c38cc97e1)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java


 make rollingUpgrade start/finalize idempotent
 -

 Key: HDFS-7340
 URL: https://issues.apache.org/jira/browse/HDFS-7340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Fix For: 2.6.0

 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch


 I was running this on a HA cluster with 
 dfs.client.test.drop.namenode.response.number set to 1. So the first request 
 goes through but the response is dropped. Which then causes another request 
 which fails and says a request is already in progress. We should add retry 
 cache support for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.


[ 
https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198493#comment-14198493
 ] 

Hudson commented on HDFS-7355:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/])
HDFS-7355. TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails 
on Windows, because we cannot deny access to the file owner. Contributed by 
Chris Nauroth. (wheat9: rev 99d710348a20ff99044207df4b92ab3bff31bd69)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java


 TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on 
 Windows, because we cannot deny access to the file owner.
 

 Key: HDFS-7355
 URL: https://issues.apache.org/jira/browse/HDFS-7355
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Fix For: 2.6.0

 Attachments: HDFS-7355.1.patch


 {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on 
 Windows.  The test attempts to simulate volume failure by denying permissions 
 to data volume directories.  This doesn't work on Windows, because Windows 
 allows the file owner access regardless of the permission settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures


[ 
https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198486#comment-14198486
 ] 

Hudson commented on HDFS-7334:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/])
HDFS-7334. Fix periodic failures of 
TestCheckpoint#testTooManyEditReplayFailures. Contributed by Charles Lamb. 
(wheat9: rev d0449bd2fd0b03765bef78b2d7952b799f06575b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
 -

 Key: HDFS-7334
 URL: https://issues.apache.org/jira/browse/HDFS-7334
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch


 TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test 
 timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()


[ 
https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198707#comment-14198707
 ] 

Konstantin Shvachko commented on HDFS-7335:
---

I am +1.
TestBalancerWithNodeGroup failure is not related to the patch.

 Redundant checkOperation() in FSN.analyzeFileState()
 

 Key: HDFS-7335
 URL: https://issues.apache.org/jira/browse/HDFS-7335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
  Labels: newbie
 Attachments: HDFS-7335.patch, HDFS-7335.patch


 FSN.analyzeFileState() should not call checkOperation(). It is already 
 properly checked before the call. First time as READ category, second time as 
 WRITE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7347) Configurable erasure coding policy for individual files and directories


 [ 
https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7347:

Attachment: HDFS-7347-20141105.patch

This patch extends {{TestBlockStoragePolicy}} to be aware of the new {{EC}} 
policy.

Thanks [~vinayrpet] for reviewing. [~jingzhao] Does the patch look OK to you 
(in the context of this HDFS-EC branch)?

 Configurable erasure coding policy for individual files and directories
 ---

 Key: HDFS-7347
 URL: https://issues.apache.org/jira/browse/HDFS-7347
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-7347-20141104.patch, HDFS-7347-20141105.patch


 HDFS users and admins should be able to turn on and off erasure coding for 
 individual files or directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()


 [ 
https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7335:
--
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

I just committed this. Congratulations Milan!

 Redundant checkOperation() in FSN.analyzeFileState()
 

 Key: HDFS-7335
 URL: https://issues.apache.org/jira/browse/HDFS-7335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
  Labels: newbie
 Fix For: 2.7.0

 Attachments: HDFS-7335.patch, HDFS-7335.patch


 FSN.analyzeFileState() should not call checkOperation(). It is already 
 properly checked before the call. First time as READ category, second time as 
 WRITE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()


[ 
https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198736#comment-14198736
 ] 

Hudson commented on HDFS-7335:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6452 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6452/])
HDFS-7335. Redundant checkOperation() in FSN.analyzeFileState(). Contributed by 
Milan Desai. (shv: rev 6e8722e49c29a19dd13e161001d2464bb1f22189)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Redundant checkOperation() in FSN.analyzeFileState()
 

 Key: HDFS-7335
 URL: https://issues.apache.org/jira/browse/HDFS-7335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
  Labels: newbie
 Fix For: 2.7.0

 Attachments: HDFS-7335.patch, HDFS-7335.patch


 FSN.analyzeFileState() should not call checkOperation(). It is already 
 properly checked before the call. First time as READ category, second time as 
 WRITE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7357) FSNamesystem.checkFileProgress should log file path


[ 
https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198742#comment-14198742
 ] 

Konstantin Shvachko commented on HDFS-7357:
---

I don't see this patch committed to trunk. Only to branch-2.

 FSNamesystem.checkFileProgress should log file path
 ---

 Key: HDFS-7357
 URL: https://issues.apache.org/jira/browse/HDFS-7357
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.7.0

 Attachments: h7357_20141104.patch


 There is a log message in FSNamesystem.checkFileProgress for in-complete 
 blocks.  However, the log message does not include the file path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7333) Improve log message in Storage.tryLock()


 [ 
https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7333:
--
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

I just committed this.

 Improve log message in Storage.tryLock()
 

 Key: HDFS-7333
 URL: https://issues.apache.org/jira/browse/HDFS-7333
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.7.0

 Attachments: logging.patch


 Confusing log message in Storage.tryLock(). It talks about namenode, while 
 this is a common part of NameNode and DataNode storage.
 The log message should include the directory path and the exception.
 Also fix the long line in tryLock().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()


 [ 
https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7335:
--
Hadoop Flags: Reviewed

 Redundant checkOperation() in FSN.analyzeFileState()
 

 Key: HDFS-7335
 URL: https://issues.apache.org/jira/browse/HDFS-7335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
  Labels: newbie
 Fix For: 2.7.0

 Attachments: HDFS-7335.patch, HDFS-7335.patch


 FSN.analyzeFileState() should not call checkOperation(). It is already 
 properly checked before the call. First time as READ category, second time as 
 WRITE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7333) Improve log message in Storage.tryLock()


 [ 
https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7333:
--
Hadoop Flags: Reviewed

 Improve log message in Storage.tryLock()
 

 Key: HDFS-7333
 URL: https://issues.apache.org/jira/browse/HDFS-7333
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.7.0

 Attachments: logging.patch


 Confusing log message in Storage.tryLock(). It talks about namenode, while 
 this is a common part of NameNode and DataNode storage.
 The log message should include the directory path and the exception.
 Also fix the long line in tryLock().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198755#comment-14198755
 ] 

Colin Patrick McCabe commented on HDFS-3107:


Thanks, I will take a look at HDFS-7056.  I suppose this means we can mark 
HDFS-7341 as a duplicate.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
 HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
 editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7333) Improve log message in Storage.tryLock()


[ 
https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198760#comment-14198760
 ] 

Hudson commented on HDFS-7333:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6453 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6453/])
HDFS-7333. Improve logging in Storage.tryLock(). Contributed by Konstantin 
Shvachko. (shv: rev 203c63030f625866e220656a8efdf05109dc7627)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve log message in Storage.tryLock()
 

 Key: HDFS-7333
 URL: https://issues.apache.org/jira/browse/HDFS-7333
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.7.0

 Attachments: logging.patch


 Confusing log message in Storage.tryLock(). It talks about namenode, while 
 this is a common part of NameNode and DataNode storage.
 The log message should include the directory path and the exception.
 Also fix the long line in tryLock().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7336) Unused member DFSInputStream.buffersize


[ 
https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198759#comment-14198759
 ] 

Konstantin Shvachko commented on HDFS-7336:
---

And there is an unused import of AtomicLong.

 Unused member DFSInputStream.buffersize
 ---

 Key: HDFS-7336
 URL: https://issues.apache.org/jira/browse/HDFS-7336
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko

 {{DFSInputStream.buffersize}} is not used anywhere in the stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN


[ 
https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198765#comment-14198765
 ] 

Colin Patrick McCabe commented on HDFS-7314:


HDFS-7314-2.patch just seems to rename {{abort}} to {{abortOpenFiles}}.  What I 
was suggesting was creating a separate function, different from {{abort}}, 
which the {{LeaseRenewer}} would call.  Actually, looking at it, I wonder if 
the lease renewer can just call {{closeAllFilesBeingWritten}}?  I haven't 
looked at it in detail so maybe there's something else the lease renewer needs 
to do, but this at least looks like a good start.

We don't need all this {{boolean removeFromFactory}} stuff.  {{getInstance}} 
will re-add the {{DFSClient}} to the map later if needed.

 Aborted DFSClient's impact on long running service like YARN
 

 Key: HDFS-7314
 URL: https://issues.apache.org/jira/browse/HDFS-7314
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7314-2.patch, HDFS-7314.patch


 It happened in YARN nodemanger scenario. But it could happen to any long 
 running service that use cached instance of DistrbutedFileSystem.
 1. Active NN is under heavy load. So it became unavailable for 10 minutes; 
 any DFSClient request will get ConnectTimeoutException.
 2. YARN nodemanager use DFSClient for certain write operation such as log 
 aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's 
 renewLease RPC got ConnectTimeoutException.
 {noformat}
 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to 
 renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds.  
 Aborting ...
 {noformat}
 3. After DFSClient is in Aborted state, YARN NM can't use that cached 
 instance of DistributedFileSystem.
 {noformat}
 2014-10-29 20:26:23,991 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc...
 java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 We can make YARN or DFSClient more tolerant to temporary NN unavailability. 
 Given the callstack is YARN - DistributedFileSystem - DFSClient, this can 
 be addressed at different layers.
 * YARN closes the DistributedFileSystem object when it receives some well 
 defined exception. Then the next HDFS call will create a new instance of 
 DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS 
 applications need to address this as well.
 * DistributedFileSystem detects Aborted DFSClient and create a new instance 
 of DFSClient. We will need to fix all the places DistributedFileSystem calls 
 DFSClient.
 * After DFSClient gets into Aborted state, it doesn't have to reject all 
 requests , instead it can retry. If NN is available again it can transition 
 to healthy state.
 Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7357) FSNamesystem.checkFileProgress should log file path


[ 
https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198766#comment-14198766
 ] 

Haohui Mai commented on HDFS-7357:
--

Thanks for the heads up -- I just pushed the missing commit to trunk.

 FSNamesystem.checkFileProgress should log file path
 ---

 Key: HDFS-7357
 URL: https://issues.apache.org/jira/browse/HDFS-7357
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.7.0

 Attachments: h7357_20141104.patch


 There is a log message in FSNamesystem.checkFileProgress for in-complete 
 blocks.  However, the log message does not include the file path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception


 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7199:
---
Summary: DFSOutputStream should not silently drop data if DataStreamer 
crashes with an unchecked exception  (was: DFSOutputStream can silently drop 
data if DataStreamer crashes with a non-I/O exception)

 DFSOutputStream should not silently drop data if DataStreamer crashes with an 
 unchecked exception
 -

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7357) FSNamesystem.checkFileProgress should log file path


[ 
https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198775#comment-14198775
 ] 

Hudson commented on HDFS-7357:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6454 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6454/])
HDFS-7357. FSNamesystem.checkFileProgress should log file path. Contributed by 
Tsz Wo Nicholas Sze. (wheat9: rev 18312804e9c86c0ea6a259e288994fea6fa366ef)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java


 FSNamesystem.checkFileProgress should log file path
 ---

 Key: HDFS-7357
 URL: https://issues.apache.org/jira/browse/HDFS-7357
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.7.0

 Attachments: h7357_20141104.patch


 There is a log message in FSNamesystem.checkFileProgress for in-complete 
 blocks.  However, the log message does not include the file path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7325) Prevent thundering herd problem in ByteArrayManager by using notify not notifyAll


 [ 
https://issues.apache.org/jira/browse/HDFS-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7325:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

 Prevent thundering herd problem in ByteArrayManager by using notify not 
 notifyAll
 -

 Key: HDFS-7325
 URL: https://issues.apache.org/jira/browse/HDFS-7325
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7325.001.patch


 Currently ByteArrayManager wakes all waiting threads whenever a byte array is 
 released and count == limit.  However, only one thread can proceed.With a 
 large number of waiters, this will cause a thundering herd problem.  (See 
 http://en.wikipedia.org/wiki/Thundering_herd_problem.)  We should avoid this 
 by only waking a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7325) Prevent thundering herd problem in ByteArrayManager by using notify not notifyAll


[ 
https://issues.apache.org/jira/browse/HDFS-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198776#comment-14198776
 ] 

Colin Patrick McCabe commented on HDFS-7325:


bq. The  above should be =. 

One tricky thing here is that the patch moves this block after the 
{{numAllocated--}}.  So I believe this should be correct... 

bq. How about simply including the change in HDFS-7358 and resolving this?

OK.

 Prevent thundering herd problem in ByteArrayManager by using notify not 
 notifyAll
 -

 Key: HDFS-7325
 URL: https://issues.apache.org/jira/browse/HDFS-7325
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7325.001.patch


 Currently ByteArrayManager wakes all waiting threads whenever a byte array is 
 released and count == limit.  However, only one thread can proceed.With a 
 large number of waiters, this will cause a thundering herd problem.  (See 
 http://en.wikipedia.org/wiki/Thundering_herd_problem.)  We should avoid this 
 by only waking a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.


[ 
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198779#comment-14198779
 ] 

Haohui Mai commented on HDFS-7359:
--

It looks to me that simply removing the checks is equivalent to the current 
proposed patch, correct?

 NameNode in secured HA cluster fails to start if 
 dfs.namenode.secondary.http-address cannot be interpreted as a network 
 address.
 

 Key: HDFS-7359
 URL: https://issues.apache.org/jira/browse/HDFS-7359
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7359.1.patch


 In a secured cluster, the JournalNode validates that the caller is one of a 
 valid set of principals.  One of the principals considered is that of the 
 SecondaryNameNode.  This involves checking 
 {{dfs.namenode.secondary.http-address}} and trying to interpret it as a 
 network address.  If a user has specified a value for this property that 
 cannot be interpeted as a network address, such as null, then this causes 
 the JournalNode operation to fail, and ultimately the NameNode cannot start.  
 The JournalNode should not have a hard dependency on 
 {{dfs.namenode.secondary.http-address}} like this.  It is not typical to run 
 a SecondaryNameNode in combination with JournalNodes.  There is even a check 
 in SecondaryNameNode that aborts if HA is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.

[
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198790#comment-14198790
]

Jing Zhao commented on HDFS-7359:
-

Removing that several lines means we no longer recognize SNN as a valid
requestor. I guess in some scenario (maybe even in the future) we can still
allow SNN to download journals from JN.
The current patch looks good to me. +1. I will commit it shortly.

NameNode in secured HA cluster fails to start if
dfs.namenode.secondary.http-address cannot be interpreted as a network
address.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3

[
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198791#comment-14198791
]

Colin Patrick McCabe commented on HDFS-7017:

bq. Unfortunately we are the 0.0001% users who disable memory overcommit. And
we also observed the std::bad_alloc in our stress test. So it is important to
not let the library die in this case and give the application opportunity to
handle it. For instance the C API of libhdfs3 return an error flag and set
errno to ENOMEM and the application will abort the query to free the memory.

Thanks, [~wangzw]. That is an interesting data point.

Turning off memory overcommit tends not to work too well on UNIX, since when an
application tries to fork(), the memory required doubles briefly. The new
child process may never use any of that memory reservation (and copy-on-write
means the overhead may be 0), but the system can't know that at the time the
{{fork}} call is made. Even if the next thing the process wants to do is
exec() a tiny program, a strict no-overcommit system (like Linux with certain
configurations) will deny the fork(). This happens a lot in Hadoop because our
big Java processes fork and exec small utility programs like groups, id,
and so forth. We have been gradually adding JNI versions for all these
use-cases, but some still remain.

bq. You are right about that we should write some log instead of exit the lease
renewer thread quietly. Adding another try ... catch block is a good suggestion.

+1.

[~wheat9], did you want to look at this before it gets committed? Let me know,
otherwise I'll commit in a day or two.

Implement OutputStream for libhdfs3
---

Key: HDFS-7017
URL: https://issues.apache.org/jira/browse/HDFS-7017
Project: Hadoop HDFS
Issue Type: Sub-task
Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
Attachments: HDFS-7017-pnative.002.patch,
HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, HDFS-7017.patch

Implement pipeline and OutputStream C++ interface

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods


 [ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7279:
-
Attachment: HDFS-7279.007.patch

 Use netty to implement DatanodeWebHdfsMethods
 -

 Key: HDFS-7279
 URL: https://issues.apache.org/jira/browse/HDFS-7279
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, webhdfs
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, 
 HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, 
 HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch


 Currently the DN implements all related webhdfs functionality using jetty. As 
 the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
 and connection management, DN often suffers from long latency and OOM when 
 its webhdfs component is under sustained heavy load.
 This jira proposes to implement the webhdfs component in DN using netty, 
 which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.

[
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth updated HDFS-7359:

Attachment: HDFS-7359.2.patch

Here is patch v2. We need one more change in {{ImageServlet}} to prevent the
problem from happening during bootstrapStandby.

bq. It looks to me that simply removing the checks is equivalent to the current
proposed patch, correct?

bq. Removing that several lines means we no longer recognize SNN as a valid
requestor. I guess in some scenario (maybe even in the future) we can still
allow SNN to download journals from JN.

Thanks for reviewing, Haohui and Jing. Right, doing it this way preserves
existing behavior if anyone out there is trying to use the SNN as requestor.
It would be a little odd to do this, and I haven't seen it in practice, but I
think it would be a backwards-incompatible change if we dropped it.

Jing, are you still +1 for the v2 patch (pending fresh Jenkins run)?

NameNode in secured HA cluster fails to start if
dfs.namenode.secondary.http-address cannot be interpreted as a network
address.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

[
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198811#comment-14198811
]

Hadoop QA commented on HDFS-7279:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12679588/HDFS-7279.007.patch
against trunk revision 1831280.

{color:red}-1 patch{color}. Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8657//console

This message is automatically generated.

Use netty to implement DatanodeWebHdfsMethods
-

Key: HDFS-7279
URL: https://issues.apache.org/jira/browse/HDFS-7279
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode, webhdfs
Reporter: Haohui Mai
Assignee: Haohui Mai
Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch,
HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch,
HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch

Currently the DN implements all related webhdfs functionality using jetty. As
the current jetty version the DN used (jetty 6) lacks of fine-grained buffer
and connection management, DN often suffers from long latency and OOM when
its webhdfs component is under sustained heavy load.
This jira proposes to implement the webhdfs component in DN using netty,
which can be more efficient and allow more finer-grain controls on webhdfs.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.

[
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198816#comment-14198816
]

Jing Zhao commented on HDFS-7359:
-

Thanks for the update, Chris!

For ImageServlet I have a question. Because ImageServlet is also used by
Secondary NN for checkpointing. With the change in v2 is it possible that we
can no longer detect wrong configuration for SNN during the startup?

NameNode in secured HA cluster fails to start if
dfs.namenode.secondary.http-address cannot be interpreted as a network
address.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7329) MiniDFSCluster should log the exception when createNameNodesAndSetConf() fails.

2014-11-05 Thread Byron Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Wong reassigned HDFS-7329:


Assignee: Byron Wong

 MiniDFSCluster should log the exception when createNameNodesAndSetConf() 
 fails.
 ---

 Key: HDFS-7329
 URL: https://issues.apache.org/jira/browse/HDFS-7329
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Byron Wong
  Labels: newbie

 When createNameNodesAndSetConf() call fails MiniDFSCluster logs an ERROR. 
 Would be good to add the actual exception in the log. Otherwise the actual 
 reason of the failure is obscured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7347) Configurable erasure coding policy for individual files and directories


[ 
https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198831#comment-14198831
 ] 

Jing Zhao commented on HDFS-7347:
-

Yeah, the patch looks good to me. +1

 Configurable erasure coding policy for individual files and directories
 ---

 Key: HDFS-7347
 URL: https://issues.apache.org/jira/browse/HDFS-7347
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-7347-20141104.patch, HDFS-7347-20141105.patch


 HDFS users and admins should be able to turn on and off erasure coding for 
 individual files or directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7347) Configurable erasure coding policy for individual files and directories


[ 
https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198834#comment-14198834
 ] 

Zhe Zhang commented on HDFS-7347:
-

[~jingzhao] Thanks for the review.

 Configurable erasure coding policy for individual files and directories
 ---

 Key: HDFS-7347
 URL: https://issues.apache.org/jira/browse/HDFS-7347
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-7347-20141104.patch, HDFS-7347-20141105.patch


 HDFS users and admins should be able to turn on and off erasure coding for 
 individual files or directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.


[ 
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198839#comment-14198839
 ] 

Chris Nauroth commented on HDFS-7359:
-

That's a good question.  I believe we'll still have debugging information in 
that case thanks to this code in {{ImageServlet}}:

{code}
LOG.info(ImageServlet rejecting:  + remoteUser);
{code}

{code}
if (UserGroupInformation.isSecurityEnabled()
 !isValidRequestor(context, request.getUserPrincipal().getName(),
conf)) {
  String errorMsg = Only Namenode, Secondary Namenode, and administrators 
may access 
  + this servlet;
  response.sendError(HttpServletResponse.SC_FORBIDDEN, errorMsg);
  LOG.warn(Received non-NN/SNN/administrator request for image or edits 
from 
  + request.getUserPrincipal().getName()
  +  at 
  + request.getRemoteHost());
  throw new IOException(errorMsg);
}
{code}

I guess another possibility would be to change the new debug log message in the 
catch block to warn level and include the values of 
{{DFS_SECONDARY_NAMENODE_KERBEROS_PRINCIPAL_KEY}} and 
{{DFS_NAMENODE_SECONDARY_HTTP_ADDRESS_KEY}}.

Let me know your thoughts, and if necessary, I can upload a v3.  Thanks again!

 NameNode in secured HA cluster fails to start if 
 dfs.namenode.secondary.http-address cannot be interpreted as a network 
 address.
 

 Key: HDFS-7359
 URL: https://issues.apache.org/jira/browse/HDFS-7359
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch


 In a secured cluster, the JournalNode validates that the caller is one of a 
 valid set of principals.  One of the principals considered is that of the 
 SecondaryNameNode.  This involves checking 
 {{dfs.namenode.secondary.http-address}} and trying to interpret it as a 
 network address.  If a user has specified a value for this property that 
 cannot be interpeted as a network address, such as null, then this causes 
 the JournalNode operation to fail, and ultimately the NameNode cannot start.  
 The JournalNode should not have a hard dependency on 
 {{dfs.namenode.secondary.http-address}} like this.  It is not typical to run 
 a SecondaryNameNode in combination with JournalNodes.  There is even a check 
 in SecondaryNameNode that aborts if HA is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7330) Unclosed RandomAccessFile warnings in FSDatasetIml.


 [ 
https://issues.apache.org/jira/browse/HDFS-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Desai reassigned HDFS-7330:
-

Assignee: Milan Desai

 Unclosed RandomAccessFile warnings in FSDatasetIml.
 ---

 Key: HDFS-7330
 URL: https://issues.apache.org/jira/browse/HDFS-7330
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
  Labels: newbie

 RandomAccessFile is opened as an underline file for FileInputStream. It 
 should be closed when the stream is closed. So to fix these 2 warning (in 
 getBlockInputStream() and getTmpInputStreams()) we just need suppress them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception


 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7199:
---
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and 2.7.  Thanks, [~shahrs87].

 DFSOutputStream should not silently drop data if DataStreamer crashes with an 
 unchecked exception
 -

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Fix For: 2.7.0

 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3


[ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198855#comment-14198855
 ] 

Haohui Mai commented on HDFS-7017:
--

I'm with Colin in terms of {{std::bad_alloc}}. At this point I'm more concerned 
about the correctness of the code. Taking care on {{std::bad_alloc}} seems a 
pretty low priority to me, and it is still up to debate whether the exception 
itself should be used.

Took a quick skim of the code. Some comments:

{code}
+
+class LeaseRenewer {
+public:
+LeaseRenewer();
+virtual ~LeaseRenewer();
+
+virtual void StartRenew(shared_ptrFileSystemImpl filesystem) = 0;
+virtual void StopRenew(shared_ptrFileSystemImpl filesystem) = 0;
+
+public:
+static LeaseRenewer GetLeaseRenewer();
+static void CreateSingleton();
+
+private:
+LeaseRenewer(const LeaseRenewer other);
+LeaseRenewer operator=(const LeaseRenewer other);
+
+static once_flag once;
+static shared_ptrLeaseRenewer renewer;
+};
+
{code}

It might be better to expose an {{instance()}} method directly in the class to 
reflect the fact this is a singleton.

{code}
+
+LeaseRenewer::LeaseRenewer() {
+}
+
{code}

This is dead code.

{code}
+LeaseRenewerImpl::~LeaseRenewerImpl() {
+stop = true;
+cond.notify_all();
+
+if (worker.joinable()) {
+worker.join();
+}
+}
+
{code}

It looks like the above code will never execute as the LeaseRenewerImpl never 
get freed.

{code}
+class LeaseRenewerImpl : public LeaseRenewer {
+public:
+LeaseRenewerImpl();
+~LeaseRenewerImpl();
+int getInterval() const;
+void setInterval(int interval);
+void StartRenew(shared_ptrFileSystemImpl filesystem);
+void StopRenew(shared_ptrFileSystemImpl filesystem);
+
+private:
+void renewer();
+
+private:
+LeaseRenewerImpl(const LeaseRenewerImpl other);
+LeaseRenewerImpl operator=(const LeaseRenewerImpl other);
+
+atomicbool stop;
+condition_variable cond;
+int interval;
+mutex mut;
+std::mapstd::string, shared_ptrFileSystemImpl maps;
+thread worker;
+};
+}
{code}

Since {{LeaseRenewer}} is a private class / interface, it works better to 
combine {{LeaseRenewer}} and {{LeaseRenewerImpl}}

{code}
+void OutputStreamImpl::append(const char *buf, int64_t size) {
{code}

should {{size}} be unsigned? What is maximum value of the size?

{code}
+void OutputStreamImpl::completeFile(bool throwError) {
{code}

You can return a {{Status}} object and let the caller to decide whether to 
throw the exception.

{code}
+shared_ptrPacket PacketPool::getPacket(int pktSize, int chunksPerPkt,
+ int64_t offsetInBlock, int64_t seqno,
+ int checksumSize) {
+if (packets.empty()) {
+return shared_ptrPacket(new Packet(
+pktSize, chunksPerPkt, offsetInBlock, seqno, checksumSize));
+} else {
+shared_ptrPacket retval = packets.front();
+packets.pop_front();
+retval-reset(pktSize, chunksPerPkt, offsetInBlock, seqno,
+  checksumSize);
+return retval;
+}
+}
{code}

The pool might need to block to guard against overcommit (it can be addressed 
in a separate jira). And to really avoid the cost of allocation, the pool needs 
to be backed by untyped arenas. I suggest removing it for now to simplify the 
code.


 Implement OutputStream for libhdfs3
 ---

 Key: HDFS-7017
 URL: https://issues.apache.org/jira/browse/HDFS-7017
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7017-pnative.002.patch, 
 HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, HDFS-7017.patch


 Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception


 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-7199.

   Resolution: Fixed
Fix Version/s: (was: 2.7.0)
   2.6.0

Committed to 2.6

 DFSOutputStream should not silently drop data if DataStreamer crashes with an 
 unchecked exception
 -

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception


 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reopened HDFS-7199:


 DFSOutputStream should not silently drop data if DataStreamer crashes with an 
 unchecked exception
 -

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7360) Test libhdfs3 against MiniDFSCluster

Haohui Mai created HDFS-7360:


 Summary: Test libhdfs3 against MiniDFSCluster
 Key: HDFS-7360
 URL: https://issues.apache.org/jira/browse/HDFS-7360
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Priority: Critical


Currently the branch has enough code to interact with HDFS servers. We should 
test the code against MiniDFSCluster to ensure the correctness of the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception

2014-11-05 Thread Rushabh S Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198866#comment-14198866
 ] 

Rushabh S Shah commented on HDFS-7199:
--

Thanks [~cmccabe] for reviewing and committing the patch.

 DFSOutputStream should not silently drop data if DataStreamer crashes with an 
 unchecked exception
 -

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception


[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198869#comment-14198869
 ] 

Hudson commented on HDFS-7199:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6455 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6455/])
HDFS-7199. DFSOutputStream should not silently drop data if DataStreamer 
crashes with an unchecked exception (rushabhs via cmccabe) (cmccabe: rev 
56257fab1d5a7f66bebd9149c7df0436c0a57adb)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
CHANGES.txt.  Move HDFS-7199 to branch-2.6 (cmccabe: rev 
7b07acb0a51d20550f62ba29bf09120684b4097b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 DFSOutputStream should not silently drop data if DataStreamer crashes with an 
 unchecked exception
 -

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.

[
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198867#comment-14198867
]

Jing Zhao commented on HDFS-7359:
-

bq. I guess another possibility would be to change the new debug log message in
the catch block to warn level and include the values of
DFS_SECONDARY_NAMENODE_KERBEROS_PRINCIPAL_KEY and
DFS_NAMENODE_SECONDARY_HTTP_ADDRESS_KEY.

Yeah, that will be helpful for debugging the issue. +1 after this change.

NameNode in secured HA cluster fails to start if
dfs.namenode.secondary.http-address cannot be interpreted as a network
address.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.

[
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth updated HDFS-7359:

Attachment: HDFS-7359.3.patch

Here is patch v3 with the improved logging. I still retained logging of the
full stack trace at debug level in case we ever need to find that. Thanks
again, Jing.

NameNode in secured HA cluster fails to start if
dfs.namenode.secondary.http-address cannot be interpreted as a network
address.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7325) Prevent thundering herd problem in ByteArrayManager by using notify not notifyAll


[ 
https://issues.apache.org/jira/browse/HDFS-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198900#comment-14198900
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7325:
---

 One tricky thing here is that the patch moves this block after the 
 numAllocated--. ...

Ah, you are correct.  We actually do not need the if since numAllocated  
maxAllocated is always true.

 Prevent thundering herd problem in ByteArrayManager by using notify not 
 notifyAll
 -

 Key: HDFS-7325
 URL: https://issues.apache.org/jira/browse/HDFS-7325
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7325.001.patch


 Currently ByteArrayManager wakes all waiting threads whenever a byte array is 
 released and count == limit.  However, only one thread can proceed.With a 
 large number of waiters, this will cause a thundering herd problem.  (See 
 http://en.wikipedia.org/wiki/Thundering_herd_problem.)  We should avoid this 
 by only waking a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7357) FSNamesystem.checkFileProgress should log file path


 [ 
https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7357:
--
Description: There is a log message in FSNamesystem.checkFileProgress for 
incomplete blocks.  However, the log message does not include the file path.  
(was: There is a log message in FSNamesystem.checkFileProgress for in-complete 
blocks.  However, the log message does not include the file path.)

 FSNamesystem.checkFileProgress should log file path
 ---

 Key: HDFS-7357
 URL: https://issues.apache.org/jira/browse/HDFS-7357
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.7.0

 Attachments: h7357_20141104.patch


 There is a log message in FSNamesystem.checkFileProgress for incomplete 
 blocks.  However, the log message does not include the file path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.

2014-11-05 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198928#comment-14198928
 ] 

Jitendra Nath Pandey commented on HDFS-7359:


+1

 NameNode in secured HA cluster fails to start if 
 dfs.namenode.secondary.http-address cannot be interpreted as a network 
 address.
 

 Key: HDFS-7359
 URL: https://issues.apache.org/jira/browse/HDFS-7359
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch


 In a secured cluster, the JournalNode validates that the caller is one of a 
 valid set of principals.  One of the principals considered is that of the 
 SecondaryNameNode.  This involves checking 
 {{dfs.namenode.secondary.http-address}} and trying to interpret it as a 
 network address.  If a user has specified a value for this property that 
 cannot be interpeted as a network address, such as null, then this causes 
 the JournalNode operation to fail, and ultimately the NameNode cannot start.  
 The JournalNode should not have a hard dependency on 
 {{dfs.namenode.secondary.http-address}} like this.  It is not typical to run 
 a SecondaryNameNode in combination with JournalNodes.  There is even a check 
 in SecondaryNameNode that aborts if HA is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN

2014-11-05 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198998#comment-14198998
 ] 

Ming Ma commented on HDFS-7314:
---

Thanks, Colin. Here are more explanations for the changes. Please let me know 
your thoughts. Appreciate your input.

1. {{abort}} is only used for this scenario. After we have {{LeaseRenewer}} 
call {{abortOpenFiles}}, {{abort}} won't be called by any functions.
2. In addition to have {{DFSClient}} call {{closeAllFilesBeingWritten}}, 
{{LeaseRenewer}} also needs to remove the {{DFSClient}} from its list via 
{{dfsclients.remove(dfsc);}} so that {{DFSClient}} doesn't renew release when 
there are no files opened. This is achieved via {{LeaseRenewer}}'s 
{{closeClient}}.
3. Whether {{LeaseRenewer}} should be removed from the factory when it gets 
SocketTimeoutException. Given {{LeaseRenewer}} thread won't exit when it gets 
SocketTimeoutException as part of the fix, if {{LeaseRenewer}} object is 
removed from the factory, then it could leak the {{LeaseRenewer}} thread even 
though the old {{LeaseRenewer}} object isn't used by other objects. In reality, 
{{LeaseRenewer}} won't be removed from the factory inside {{closeClient}} given 
given {{isRenewerExpired()}} will return false. So {{removeFromFactory}} is 
there mostly for the semantics, not necessary.

 Aborted DFSClient's impact on long running service like YARN
 

 Key: HDFS-7314
 URL: https://issues.apache.org/jira/browse/HDFS-7314
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7314-2.patch, HDFS-7314.patch


 It happened in YARN nodemanger scenario. But it could happen to any long 
 running service that use cached instance of DistrbutedFileSystem.
 1. Active NN is under heavy load. So it became unavailable for 10 minutes; 
 any DFSClient request will get ConnectTimeoutException.
 2. YARN nodemanager use DFSClient for certain write operation such as log 
 aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's 
 renewLease RPC got ConnectTimeoutException.
 {noformat}
 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to 
 renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds.  
 Aborting ...
 {noformat}
 3. After DFSClient is in Aborted state, YARN NM can't use that cached 
 instance of DistributedFileSystem.
 {noformat}
 2014-10-29 20:26:23,991 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc...
 java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 We can make YARN or DFSClient more tolerant to temporary NN unavailability. 
 Given the callstack is YARN - DistributedFileSystem - DFSClient, this can 
 be addressed at different layers.
 * YARN closes the DistributedFileSystem object when it receives some well 
 defined exception. Then the next HDFS call will create a new instance of 
 DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS 
 applications need to address this as well.
 * DistributedFileSystem detects Aborted DFSClient and create a new instance 
 of DFSClient. We will need to fix all the places DistributedFileSystem calls 
 DFSClient.
 * After DFSClient gets into Aborted state, it doesn't have to reject all 
 requests , instead it can retry. If NN is available again it can transition 
 to healthy state.
 Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7336) Unused member DFSInputStream.buffersize


 [ 
https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Desai reassigned HDFS-7336:
-

Assignee: Milan Desai

 Unused member DFSInputStream.buffersize
 ---

 Key: HDFS-7336
 URL: https://issues.apache.org/jira/browse/HDFS-7336
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai

 {{DFSInputStream.buffersize}} is not used anywhere in the stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7336) Unused member DFSInputStream.buffersize


 [ 
https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Desai updated HDFS-7336:
--
Attachment: HDFS-7336.patch

Removed buffersize parameter from DFSInputStream and DFSClient 
constructor/method signatures and fixed side effects.

 Unused member DFSInputStream.buffersize
 ---

 Key: HDFS-7336
 URL: https://issues.apache.org/jira/browse/HDFS-7336
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
 Attachments: HDFS-7336.patch


 {{DFSInputStream.buffersize}} is not used anywhere in the stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7336) Unused member DFSInputStream.buffersize


 [ 
https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Desai updated HDFS-7336:
--
Status: Patch Available  (was: In Progress)

 Unused member DFSInputStream.buffersize
 ---

 Key: HDFS-7336
 URL: https://issues.apache.org/jira/browse/HDFS-7336
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
 Attachments: HDFS-7336.patch


 {{DFSInputStream.buffersize}} is not used anywhere in the stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HDFS-7336) Unused member DFSInputStream.buffersize


 [ 
https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-7336 started by Milan Desai.
-
 Unused member DFSInputStream.buffersize
 ---

 Key: HDFS-7336
 URL: https://issues.apache.org/jira/browse/HDFS-7336
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
 Attachments: HDFS-7336.patch


 {{DFSInputStream.buffersize}} is not used anywhere in the stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7336) Unused member DFSInputStream.buffersize


[ 
https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199042#comment-14199042
 ] 

Hadoop QA commented on HDFS-7336:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679618/HDFS-7336.patch
  against trunk revision bc80251.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8663//console

This message is automatically generated.

 Unused member DFSInputStream.buffersize
 ---

 Key: HDFS-7336
 URL: https://issues.apache.org/jira/browse/HDFS-7336
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
 Attachments: HDFS-7336.patch


 {{DFSInputStream.buffersize}} is not used anywhere in the stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7329) MiniDFSCluster should log the exception when createNameNodesAndSetConf() fails.

2014-11-05 Thread Byron Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Wong updated HDFS-7329:
-
Status: Patch Available  (was: Open)

 MiniDFSCluster should log the exception when createNameNodesAndSetConf() 
 fails.
 ---

 Key: HDFS-7329
 URL: https://issues.apache.org/jira/browse/HDFS-7329
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Byron Wong
  Labels: newbie
 Attachments: HDFS-7329.patch


 When createNameNodesAndSetConf() call fails MiniDFSCluster logs an ERROR. 
 Would be good to add the actual exception in the log. Otherwise the actual 
 reason of the failure is obscured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7329) MiniDFSCluster should log the exception when createNameNodesAndSetConf() fails.

2014-11-05 Thread Byron Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Wong updated HDFS-7329:
-
Attachment: HDFS-7329.patch

Added patch.

 MiniDFSCluster should log the exception when createNameNodesAndSetConf() 
 fails.
 ---

 Key: HDFS-7329
 URL: https://issues.apache.org/jira/browse/HDFS-7329
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Byron Wong
  Labels: newbie
 Attachments: HDFS-7329.patch


 When createNameNodesAndSetConf() call fails MiniDFSCluster logs an ERROR. 
 Would be good to add the actual exception in the log. Otherwise the actual 
 reason of the failure is obscured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7347) Configurable erasure coding policy for individual files and directories


[ 
https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199072#comment-14199072
 ] 

Hadoop QA commented on HDFS-7347:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12679577/HDFS-7347-20141105.patch
  against trunk revision a7fbd4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.util.TestByteArrayManager

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8656//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8656//console

This message is automatically generated.

 Configurable erasure coding policy for individual files and directories
 ---

 Key: HDFS-7347
 URL: https://issues.apache.org/jira/browse/HDFS-7347
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-7347-20141104.patch, HDFS-7347-20141105.patch


 HDFS users and admins should be able to turn on and off erasure coding for 
 individual files or directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager


[ 
https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199077#comment-14199077
 ] 

stack commented on HDFS-7358:
-

See how we are 'Waiting for ack for: 42' twice in the below log snippet though 
we wrote out a seq no. 43.  At about same time the allocate/recycle numbering 
goes 'off' at this time because its waiting on an ack that doesn't ever arrive 
so there is an outstanding allocation with a corresponding recycle that will 
never come (Should + one.releaseBuffer(byteArrayManager); be inside a finally 
block?) If I run with one thread only, I don't see this issue. It is only with 
two or more. My little program has 5 threads writing and calling sync.

I turned this feature off and see that we are skipping ack numbers from time to 
time so this is problem is not brought on by this feature but you can't use 
this feature till its fixed.  Looking...

{code}
...
2014-11-05 11:16:47,293 DEBUG [sync.0] util.ByteArrayManager: allocate(65565): 
count=43, aboveThreshold, [131072: 1/10, free=1], recycled? true
2014-11-05 11:16:47,293 DEBUG [sync.0] hdfs.DFSClient: DFSClient writeChunk 
allocating new packet seqno=41, 
src=/user/stack/test-data/2256ed2b-6cc1-4144-88a5-227baf11842c/HLogPerformanceEvaluation/wals/hlog.1415215004083,
 packetSize=65532, chunksPerPacket=127, bytesCurBlock=31232
2014-11-05 11:16:47,293 DEBUG [sync.0] hdfs.DFSClient: DFSClient flush() : 
bytesCurBlock 32088 lastFlushOffset 31579
2014-11-05 11:16:47,293 DEBUG [sync.0] hdfs.DFSClient: Queued packet 41
2014-11-05 11:16:47,293 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 41
2014-11-05 11:16:47,293 DEBUG [DataStreamer for file 
/user/stack/test-data/2256ed2b-6cc1-4144-88a5-227baf11842c/HLogPerformanceEvaluation/wals/hlog.1415215004083
 block BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] 
hdfs.DFSClient: DataStreamer block 
BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940 sending 
packet packet seqno:41 offsetInBlock:31232 lastPacketInBlock:false 
lastByteOffsetInBlock: 32088
2014-11-05 11:16:47,294 DEBUG [ResponseProcessor for block 
BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] 
hdfs.DFSClient: DFSClient seqno: 40 status: SUCCESS status: SUCCESS status: 
SUCCESS downstreamAckTimeNanos: 487791
2014-11-05 11:16:47,294 DEBUG [ResponseProcessor for block 
BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] 
util.ByteArrayManager: recycle: array.length=131072, [131072: 2/10, free=0], 
freeQueue.offer, freeQueueSize=1
2014-11-05 11:16:47,294 DEBUG [ResponseProcessor for block 
BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] 
hdfs.DFSClient: DFSClient seqno: 41 status: SUCCESS status: SUCCESS status: 
SUCCESS downstreamAckTimeNanos: 465086
2014-11-05 11:16:47,294 DEBUG [sync.1] util.ByteArrayManager: allocate(65565): 
count=44, aboveThreshold, [131072: 1/10, free=1], recycled? true
2014-11-05 11:16:47,295 DEBUG [sync.1] hdfs.DFSClient: DFSClient writeChunk 
allocating new packet seqno=42, 
src=/user/stack/test-data/2256ed2b-6cc1-4144-88a5-227baf11842c/HLogPerformanceEvaluation/wals/hlog.1415215004083,
 packetSize=65532, chunksPerPacket=127, bytesCurBlock=31744
2014-11-05 11:16:47,295 DEBUG [ResponseProcessor for block 
BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] 
util.ByteArrayManager: recycle: array.length=131072, [131072: 2/10, free=0], 
freeQueue.offer, freeQueueSize=1
2014-11-05 11:16:47,295 DEBUG [sync.1] hdfs.DFSClient: DFSClient flush() : 
bytesCurBlock 32853 lastFlushOffset 32088
2014-11-05 11:16:47,295 DEBUG [sync.1] hdfs.DFSClient: Queued packet 42
2014-11-05 11:16:47,295 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 42
2014-11-05 11:16:47,295 DEBUG [DataStreamer for file 
/user/stack/test-data/2256ed2b-6cc1-4144-88a5-227baf11842c/HLogPerformanceEvaluation/wals/hlog.1415215004083
 block BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] 
hdfs.DFSClient: DataStreamer block 
BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940 sending 
packet packet seqno:42 offsetInBlock:31744 lastPacketInBlock:false 
lastByteOffsetInBlock: 32853

2014-11-05 11:16:47,295 DEBUG [sync.0] util.ByteArrayManager: allocate(65565): 
count=45, aboveThreshold, [131072: 1/10, free=1], recycled? true
2014-11-05 11:16:47,295 DEBUG [sync.0] hdfs.DFSClient: DFSClient writeChunk 
allocating new packet seqno=43, 
src=/user/stack/test-data/2256ed2b-6cc1-4144-88a5-227baf11842c/HLogPerformanceEvaluation/wals/hlog.1415215004083,
 packetSize=65532, chunksPerPacket=127, bytesCurBlock=32768
2014-11-05 11:16:47,295 DEBUG [sync.0] hdfs.DFSClient: DFSClient flush() : 
bytesCurBlock 32853 lastFlushOffset 32853
2014-11-05 11:16:47,295 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 42
2014-11-05 11:16:47,296 DEBUG [ResponseProcessor for block

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.


[ 
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199095#comment-14199095
 ] 

Hadoop QA commented on HDFS-7359:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679589/HDFS-7359.2.patch
  against trunk revision 1831280.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
  org.apache.hadoop.hdfs.server.balancer.TestBalancer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8658//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8658//console

This message is automatically generated.

 NameNode in secured HA cluster fails to start if 
 dfs.namenode.secondary.http-address cannot be interpreted as a network 
 address.
 

 Key: HDFS-7359
 URL: https://issues.apache.org/jira/browse/HDFS-7359
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch


 In a secured cluster, the JournalNode validates that the caller is one of a 
 valid set of principals.  One of the principals considered is that of the 
 SecondaryNameNode.  This involves checking 
 {{dfs.namenode.secondary.http-address}} and trying to interpret it as a 
 network address.  If a user has specified a value for this property that 
 cannot be interpeted as a network address, such as null, then this causes 
 the JournalNode operation to fail, and ultimately the NameNode cannot start.  
 The JournalNode should not have a hard dependency on 
 {{dfs.namenode.secondary.http-address}} like this.  It is not typical to run 
 a SecondaryNameNode in combination with JournalNodes.  There is even a check 
 in SecondaryNameNode that aborts if HA is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.

[
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199124#comment-14199124
]

Chris Nauroth commented on HDFS-7359:
-

The test failures are unrelated. {{TestBalancer}} has been flaky. It's
passing for me locally. The {{TestCheckpoint}} failure repros on current trunk
even without this patch. We're still waiting on the Jenkins run for patch v3,
which is currently in progress.

NameNode in secured HA cluster fails to start if
dfs.namenode.secondary.http-address cannot be interpreted as a network
address.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager


[ 
https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199141#comment-14199141
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7358:
---

 ... (Should + one.releaseBuffer(byteArrayManager); be inside a finally 
 block?) ...

You make a good point that the array may not be released when the pipeline 
eventually fails.  We cannot call releaseBuffer(..) in a finally block since, 
for the usual error cases, client will reconstruct the pipeline and retry 
sending the same packets.  I will think about how to fix it.

 Clients may get stuck waiting when using ByteArrayManager
 -

 Key: HDFS-7358
 URL: https://issues.apache.org/jira/browse/HDFS-7358
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch


 [~stack] reported that clients might get stuck waiting when using 
 ByteArrayManager; see [his 
 comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7347) Configurable erasure coding policy for individual files and directories


[ 
https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199146#comment-14199146
 ] 

Zhe Zhang commented on HDFS-7347:
-

{{TestByteArrayManager}} is unrelated and passes locally.

 Configurable erasure coding policy for individual files and directories
 ---

 Key: HDFS-7347
 URL: https://issues.apache.org/jira/browse/HDFS-7347
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-7347-20141104.patch, HDFS-7347-20141105.patch


 HDFS users and admins should be able to turn on and off erasure coding for 
 individual files or directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager


[ 
https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199163#comment-14199163
 ] 

stack commented on HDFS-7358:
-

Looking at packet sequence numbers, it seems like this just how it works -- 
that a later seqnumber acks outstanding ones (I don't know enough to call it 
otherwise -- maybe you know [~szetszwo]?) -- and if so, we will have 
outstanding allocations and our counts will be off.  Thanks.

 Clients may get stuck waiting when using ByteArrayManager
 -

 Key: HDFS-7358
 URL: https://issues.apache.org/jira/browse/HDFS-7358
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch


 [~stack] reported that clients might get stuck waiting when using 
 ByteArrayManager; see [his 
 comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7361) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation.

Chris Nauroth created HDFS-7361:
---

 Summary: TestCheckpoint#testStorageAlreadyLockedErrorMessage fails 
after change of log message related to locking violation.
 Key: HDFS-7361
 URL: https://issues.apache.org/jira/browse/HDFS-7361
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode, test
Reporter: Chris Nauroth
Priority: Minor


HDFS-7333 changed the log message related to locking violation on a storage 
directory.  There is an assertion in 
{{TestCheckpoint#testStorageAlreadyLockedErrorMessage}} that has been failing 
since that change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7361) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation.


[ 
https://issues.apache.org/jira/browse/HDFS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199171#comment-14199171
 ] 

Chris Nauroth commented on HDFS-7361:
-

Here is the output from a failed test run.

{code}
Running org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
Tests run: 38, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 40.681 sec  
FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
testStorageAlreadyLockedErrorMessage(org.apache.hadoop.hdfs.server.namenode.TestCheckpoint)
  Time elapsed: 0.079 sec   FAILURE!
java.lang.AssertionError: Log output does not contain expected log message: It 
appears that another namenode 28733@Chriss-MacBook-Pro.local has already locked 
the storage directory
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.server.namenode.TestCheckpoint.testStorageAlreadyLockedErrorMessage(TestCheckpoint.java:867)
{code}


 TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log 
 message related to locking violation.
 ---

 Key: HDFS-7361
 URL: https://issues.apache.org/jira/browse/HDFS-7361
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode, test
Reporter: Chris Nauroth
Priority: Minor

 HDFS-7333 changed the log message related to locking violation on a storage 
 directory.  There is an assertion in 
 {{TestCheckpoint#testStorageAlreadyLockedErrorMessage}} that has been failing 
 since that change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7333) Improve log message in Storage.tryLock()


[ 
https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199174#comment-14199174
 ] 

Chris Nauroth commented on HDFS-7333:
-

This patch introduced a test failure in 
{{TestCheckpoint#testStorageAlreadyLockedErrorMessage}}.  I filed HDFS-7361 to 
track it.  [~shv], would you please take a look?  Thank you.

 Improve log message in Storage.tryLock()
 

 Key: HDFS-7333
 URL: https://issues.apache.org/jira/browse/HDFS-7333
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.7.0

 Attachments: logging.patch


 Confusing log message in Storage.tryLock(). It talks about namenode, while 
 this is a common part of NameNode and DataNode storage.
 The log message should include the directory path and the exception.
 Also fix the long line in tryLock().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.

[
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199176#comment-14199176
]

Chris Nauroth commented on HDFS-7359:
-

The {{TestCheckpoint}} failure was introduced in HDFS-7333. I filed HDFS-7361
to track fixing it.

NameNode in secured HA cluster fails to start if
dfs.namenode.secondary.http-address cannot be interpreted as a network
address.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager


[ 
https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199182#comment-14199182
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7358:
---

 ... that a later seqnumber acks outstanding ones ...

The pipeline expects an ack for every packets.  It won't have acks with skipped 
seq no.

 Clients may get stuck waiting when using ByteArrayManager
 -

 Key: HDFS-7358
 URL: https://issues.apache.org/jira/browse/HDFS-7358
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch


 [~stack] reported that clients might get stuck waiting when using 
 ByteArrayManager; see [his 
 comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager


[ 
https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199189#comment-14199189
 ] 

stack commented on HDFS-7358:
-

Makes sense.  There is a bug in dfsoutputstream then? I can get skipping of 
seqno without this feature enabled.

 Clients may get stuck waiting when using ByteArrayManager
 -

 Key: HDFS-7358
 URL: https://issues.apache.org/jira/browse/HDFS-7358
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch


 [~stack] reported that clients might get stuck waiting when using 
 ByteArrayManager; see [his 
 comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7362) Proxy user refresh won't modify or remove existing groups or hosts from super user list

2014-11-05 Thread Eric Payne (JIRA)

Eric Payne created HDFS-7362:


 Summary: Proxy user refresh won't modify or remove existing groups 
or hosts from super user list
 Key: HDFS-7362
 URL: https://issues.apache.org/jira/browse/HDFS-7362
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Eric Payne
Assignee: Eric Payne


2.x added a new DefaultImpersonationProvider class for reading the superuser
configuration. In this class, once the host and group properties for a 
proxyuser are defined, they cannot be removed or modified without bouncing the 
daemon.

As long as the config is updated correctly the first time, this problem won't 
manifest itself. Once defined, these properties don't tend to change. However, 
if the properties are mis-entered the first time, restarting the NN/RM/JHS/etc 
will be necessary to correctly re-read the config. An admin refresh won't do it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7361) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation.


[ 
https://issues.apache.org/jira/browse/HDFS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199209#comment-14199209
 ] 

Konstantin Shvachko commented on HDFS-7361:
---

Sure will fix this. Wonder why Jenkins didn't fail for HDFS-7333.

 TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log 
 message related to locking violation.
 ---

 Key: HDFS-7361
 URL: https://issues.apache.org/jira/browse/HDFS-7361
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode, test
Reporter: Chris Nauroth
Priority: Minor

 HDFS-7333 changed the log message related to locking violation on a storage 
 directory.  There is an assertion in 
 {{TestCheckpoint#testStorageAlreadyLockedErrorMessage}} that has been failing 
 since that change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7363) Pluggable algorithms to form block groups in erasure coding

Zhe Zhang created HDFS-7363:
---

 Summary: Pluggable algorithms to form block groups in erasure 
coding
 Key: HDFS-7363
 URL: https://issues.apache.org/jira/browse/HDFS-7363
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.


[ 
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199228#comment-14199228
 ] 

Hadoop QA commented on HDFS-7359:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679600/HDFS-7359.3.patch
  against trunk revision bc80251.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.util.TestExactSizeInputStream
org.apache.haTests
org.apache.hadoop.hdfs.web.TestAuthFilter
org.apache.hadoop.hdfs.web.TestWebTests
org.apache.hadoop.hdfs.TesTests
org.apacheTests
org.apache.hadoop.hdfs.TestFSInputChecker
org.apache.hadoop.hdfs.serveTests
org.apache.hadoop.hdfs.server.Tests
org.apache.hadoop.hdfs.sTests
org.apache.hadoop.hdfs.server.namenode.TestNameNodeResourceChecker
org.apache.hadoop.hdfs.server.namenode.TestFsck
org.apache.hadoop.hdfs.TestClientReportBadBlock

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8661//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8661//console

This message is automatically generated.

 NameNode in secured HA cluster fails to start if 
 dfs.namenode.secondary.http-address cannot be interpreted as a network 
 address.
 

 Key: HDFS-7359
 URL: https://issues.apache.org/jira/browse/HDFS-7359
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch


 In a secured cluster, the JournalNode validates that the caller is one of a 
 valid set of principals.  One of the principals considered is that of the 
 SecondaryNameNode.  This involves checking 
 {{dfs.namenode.secondary.http-address}} and trying to interpret it as a 
 network address.  If a user has specified a value for this property that 
 cannot be interpeted as a network address, such as null, then this causes 
 the JournalNode operation to fail, and ultimately the NameNode cannot start.  
 The JournalNode should not have a hard dependency on 
 {{dfs.namenode.secondary.http-address}} like this.  It is not typical to run 
 a SecondaryNameNode in combination with JournalNodes.  There is even a check 
 in SecondaryNameNode that aborts if HA is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods


[ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199229#comment-14199229
 ] 

Hadoop QA commented on HDFS-7279:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679588/HDFS-7279.007.patch
  against trunk revision b4c951a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
  org.apache.hadoop.hdfs.TestRollingUpgrade
  org.apache.hadoop.hdfs.TestParallelUnixDomainRead

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.util.TestExactSizeInputStream
org.apache.haTests
org.apache.hadoop.hdfs.web.TestAuthFilter
org.apache.hadoop.hdfs.web.TestWebTests
org.apache.hadoop.hdfs.TesTests
org.apacheTests
org.apache.hadoop.hdfs.TestFSInputChecker
org.apache.hadoop.hdfs.serveTests
org.apache.hadoop.hdfs.server.Tests
org.apache.hadoop.hdfs.sTests
org.apache.hadoop.hdfs.server.namenode.TestNameNodeResourceChecker
org.apache.hadoop.hdfs.server.namenode.TestFsck
org.apache.hadoop.hdfs.TestClientReportBadBlock

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8660//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8660//console

This message is automatically generated.

 Use netty to implement DatanodeWebHdfsMethods
 -

 Key: HDFS-7279
 URL: https://issues.apache.org/jira/browse/HDFS-7279
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, webhdfs
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, 
 HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, 
 HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch


 Currently the DN implements all related webhdfs functionality using jetty. As 
 the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
 and connection management, DN often suffers from long latency and OOM when 
 its webhdfs component is under sustained heavy load.
 This jira proposes to implement the webhdfs component in DN using netty, 
 which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager


[ 
https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199246#comment-14199246
 ] 

stack commented on HDFS-7358:
-

Without ByteArrayManager enabled, using tip of 2.6 logging at DEBUG level 
grepping 'Waiting for ack' I see us skipping packet seqnos.  See below. See 
doubled '8', '22', and '26'.


{code}
2014-11-05 14:08:57,240 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 1
2014-11-05 14:08:57,243 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 2
2014-11-05 14:08:57,245 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 3
2014-11-05 14:08:57,246 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 4
2014-11-05 14:08:57,246 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 5
2014-11-05 14:08:57,249 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 6
2014-11-05 14:08:57,250 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 7
2014-11-05 14:08:57,252 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 8
2014-11-05 14:08:57,252 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 8
2014-11-05 14:08:57,253 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 10
2014-11-05 14:08:57,254 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 11
2014-11-05 14:08:57,255 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 12
2014-11-05 14:08:57,255 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 13
2014-11-05 14:08:57,257 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 14
2014-11-05 14:08:57,258 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 15
2014-11-05 14:08:57,258 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 16
2014-11-05 14:08:57,259 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 17
2014-11-05 14:08:57,261 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 18
2014-11-05 14:08:57,262 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 19
2014-11-05 14:08:57,263 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 20
2014-11-05 14:08:57,264 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 21
2014-11-05 14:08:57,265 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 22
2014-11-05 14:08:57,265 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 22
2014-11-05 14:08:57,267 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 24
2014-11-05 14:08:57,267 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 25
2014-11-05 14:08:57,268 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 26
2014-11-05 14:08:57,268 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 26
2014-11-05 14:08:57,270 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 28
2014-11-05 14:08:57,270 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 29
2014-11-05 14:08:57,271 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 30
...
{code}

 Clients may get stuck waiting when using ByteArrayManager
 -

 Key: HDFS-7358
 URL: https://issues.apache.org/jira/browse/HDFS-7358
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch


 [~stack] reported that clients might get stuck waiting when using 
 ByteArrayManager; see [his 
 comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7361) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation.


 [ 
https://issues.apache.org/jira/browse/HDFS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7361:
--
Attachment: HDFS-7361.patch

Here is the patch that fixes TestCheckpoint.
Also wrapped long lines in testStorageAlreadyLockedErrorMessage().

 TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log 
 message related to locking violation.
 ---

 Key: HDFS-7361
 URL: https://issues.apache.org/jira/browse/HDFS-7361
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode, test
Reporter: Chris Nauroth
Priority: Minor
 Attachments: HDFS-7361.patch


 HDFS-7333 changed the log message related to locking violation on a storage 
 directory.  There is an assertion in 
 {{TestCheckpoint#testStorageAlreadyLockedErrorMessage}} that has been failing 
 since that change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7361) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation.


 [ 
https://issues.apache.org/jira/browse/HDFS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7361:
--
Assignee: Konstantin Shvachko
  Status: Patch Available  (was: Open)

 TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log 
 message related to locking violation.
 ---

 Key: HDFS-7361
 URL: https://issues.apache.org/jira/browse/HDFS-7361
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode, test
Reporter: Chris Nauroth
Assignee: Konstantin Shvachko
Priority: Minor
 Attachments: HDFS-7361.patch


 HDFS-7333 changed the log message related to locking violation on a storage 
 directory.  There is an assertion in 
 {{TestCheckpoint#testStorageAlreadyLockedErrorMessage}} that has been failing 
 since that change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7333) Improve log message in Storage.tryLock()


[ 
https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199261#comment-14199261
 ] 

Konstantin Shvachko commented on HDFS-7333:
---

Sounds like Jenkins build is very much broken. It is one thing when a build 
gives you false negatives. But false positives make it broken, imho.

 Improve log message in Storage.tryLock()
 

 Key: HDFS-7333
 URL: https://issues.apache.org/jira/browse/HDFS-7333
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.7.0

 Attachments: logging.patch


 Confusing log message in Storage.tryLock(). It talks about namenode, while 
 this is a common part of NameNode and DataNode storage.
 The log message should include the directory path and the exception.
 Also fix the long line in tryLock().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.

[
https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth updated HDFS-7359:

Hadoop Flags: Reviewed

I think something confused the string parsing Jenkins does to search for timed
out tests. I reviewed the console output, and I didn't see any evidence that
these tests had timed out. I reran locally, and they were all fine.

I'll commit this later today.

NameNode in secured HA cluster fails to start if
dfs.namenode.secondary.http-address cannot be interpreted as a network
address.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.