[jira] [Commented] (HDFS-10668) Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount

2016-07-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393336#comment-15393336
 ] 

Mingliang Liu commented on HDFS-10668:
--

Thanks for the commit!

> Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount
> -
>
> Key: HDFS-10668
> URL: https://issues.apache.org/jira/browse/HDFS-10668
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-10668.000.patch
>
>
> h6.Error Message
> {code}
> After delete one file expected:<4> but was:<5>
> {code}
> h6. Stacktrace
> {code}
> java.lang.AssertionError: After delete one file expected:<4> but was:<5>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean.testDataNodeMXBeanBlockCount(TestDataNodeMXBean.java:124)
> {code}
> Sample failing Jenkins pre-commit built, see 
> [here|https://builds.apache.org/job/PreCommit-HDFS-Build/16094/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMXBean/testDataNodeMXBeanBlockCount/].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2016-07-25 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-9276:
-
Attachment: HDFS-9276.19.patch

Patch 19:
* Fix the unit test bug
* Bring back {{publicService}} related code

Passed units.
Passed {{HDFSReadLoop}} spark test.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, HDFS-9276.14.patch, 
> HDFS-9276.15.patch, HDFS-9276.16.patch, HDFS-9276.17.patch, 
> HDFS-9276.18.patch, HDFS-9276.19.patch, HDFSReadLoop.scala, debug1.PNG, 
> debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.

[jira] [Updated] (HDFS-10633) DiskBalancer : Add the description for the new setting dfs.disk.balancer.plan.threshold.percent in HDFSDiskbalancer.md

2016-07-25 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10633:
-
Attachment: HDFS-10633.004.patch

The file hdfs-default.xml has been updated in HDFS-10651, post a new patch that 
rebased the latest code.

> DiskBalancer : Add the description for the new setting 
> dfs.disk.balancer.plan.threshold.percent in HDFSDiskbalancer.md
> --
>
> Key: HDFS-10633
> URL: https://issues.apache.org/jira/browse/HDFS-10633
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.9.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-10633.001.patch, HDFS-10633.002.patch, 
> HDFS-10633.003.patch, HDFS-10633.004.patch
>
>
> After HDFS-10600, it introduced a new setting 
> {{dfs.disk.balancer.plan.threshold.percent}} in diskbalancer. This setting 
> controls if we need to do any balancing on the volume set. But now this new 
> setting was not updated in {{HDFSDiskbalancer.md}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10668) Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount

2016-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393284#comment-15393284
 ] 

Hudson commented on HDFS-10668:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #10149 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10149/])
HDFS-10668. Fix intermittently failing UT (brahma: rev 
7cac7655fd84ac394250705b31e3927fe548e34a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMXBean.java


> Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount
> -
>
> Key: HDFS-10668
> URL: https://issues.apache.org/jira/browse/HDFS-10668
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-10668.000.patch
>
>
> h6.Error Message
> {code}
> After delete one file expected:<4> but was:<5>
> {code}
> h6. Stacktrace
> {code}
> java.lang.AssertionError: After delete one file expected:<4> but was:<5>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean.testDataNodeMXBeanBlockCount(TestDataNodeMXBean.java:124)
> {code}
> Sample failing Jenkins pre-commit built, see 
> [here|https://builds.apache.org/job/PreCommit-HDFS-Build/16094/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMXBean/testDataNodeMXBeanBlockCount/].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10671) Fix typo in HdfsRollingUpgrade.md

2016-07-25 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393275#comment-15393275
 ] 

Yiqun Lin commented on HDFS-10671:
--

Thanks [~iwasakims] for the review and commit! Can you help close this jira and 
set the fix version, [~iwasakims], thanks.

> Fix typo in HdfsRollingUpgrade.md
> -
>
> Key: HDFS-10671
> URL: https://issues.apache.org/jira/browse/HDFS-10671
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Trivial
> Attachments: HDFS-10671.001.patch
>
>
> In document {{HdfsRollingUpgrade.md}},
> {quote}
> The namenodes can be upgraded independent of datanods and journal nodes.
> {quote}
> Here {{datanods}} should be {{datanodes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10668) Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount

2016-07-25 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-10668:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk,branch-2 and branch-2.8.. Thanks [~liuml07] for your 
contribution..Sorry for delay..

> Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount
> -
>
> Key: HDFS-10668
> URL: https://issues.apache.org/jira/browse/HDFS-10668
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-10668.000.patch
>
>
> h6.Error Message
> {code}
> After delete one file expected:<4> but was:<5>
> {code}
> h6. Stacktrace
> {code}
> java.lang.AssertionError: After delete one file expected:<4> but was:<5>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean.testDataNodeMXBeanBlockCount(TestDataNodeMXBean.java:124)
> {code}
> Sample failing Jenkins pre-commit built, see 
> [here|https://builds.apache.org/job/PreCommit-HDFS-Build/16094/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMXBean/testDataNodeMXBeanBlockCount/].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10620) StringBuilder created and appended even if logging is disabled

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393254#comment-15393254
 ] 

Hadoop QA commented on HDFS-10620:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
26s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
29s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
36s{color} | {color:green} branch-2 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
43s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
17s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
34s{color} | {color:green} branch-2 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m  8s{color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_101. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}188m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_101 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
| JDK v1.7.0_101 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:b59b8b7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820079/HDFS-10620-branch-2.01.patch
 |
| JIRA Issue | HDFS-10620 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux cf47004c743b 3.13.0-36-lowlatency #63-Ubuntu SMP PRE

[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2016-07-25 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393181#comment-15393181
 ] 

John Zhuge commented on HDFS-9276:
--

And because of the wrong unit test, changes in 17 and 18 were not properly 
validated. It looks like {{publicService}} is needed for {{PrivateToken}} 
because it does need 2 services: 1 for lookup and 1 for refreshing private 
tokens.

Learning from the mistake, I will test the next patch against my Spark program 
{{HDFSReadLoop.scala}} in addition to the unit test.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, HDFS-9276.14.patch, 
> HDFS-9276.15.patch, HDFS-9276.16.patch, HDFS-9276.17.patch, 
> HDFS-9276.18.patch, HDFSReadLoop.scala, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorIm

[jira] [Commented] (HDFS-10667) Report more accurate info about data corruption location

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393171#comment-15393171
 ] 

Hadoop QA commented on HDFS-10667:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 24s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 86m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820080/HDFS-10667.003.patch |
| JIRA Issue | HDFS-10667 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ba3feead1b3b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 85a2050 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16184/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16184/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16184/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Report more accurate info about data corruption location
> 
>
> Key: HDFS-10667
> URL: https://issues.apache.org/jira/browse/HDFS-10667
> Project: Ha

[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2016-07-25 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393157#comment-15393157
 ] 

John Zhuge commented on HDFS-9276:
--

[~xiaochen], Thanks for the catch on the unit test! Patch 17 introduced the 
regression.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, HDFS-9276.14.patch, 
> HDFS-9276.15.patch, HDFS-9276.16.patch, HDFS-9276.17.patch, 
> HDFS-9276.18.patch, HDFSReadLoop.scala, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod

[jira] [Commented] (HDFS-10642) Fix intermittently failing UT TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393111#comment-15393111
 ] 

Hadoop QA commented on HDFS-10642:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 77 unchanged - 1 fixed = 77 total (was 78) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 61m  
0s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820073/HDFS-10642.001.patch |
| JIRA Issue | HDFS-10642 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b8dd5a72b3e1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d383bfd |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16181/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16181/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Fix intermittently failing UT 
> TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas
> ---
>
> Key: HDFS-10642
> URL: https://issues.apache.org/jira/browse/HDFS-10642
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-10642.000.patch, HDFS-10642.0

[jira] [Commented] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393095#comment-15393095
 ] 

Hadoop QA commented on HDFS-10682:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 24s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 69 new + 79 unchanged - 41 fixed = 148 total (was 120) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 61m 
14s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 85m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820070/HDFS-10682.003.patch |
| JIRA Issue | HDFS-10682 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a1a445642174 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d383bfd |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16180/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16180/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16180/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop H

[jira] [Commented] (HDFS-10689) "hdfs dfs -chmod 777" does not remove sticky bit

2016-07-25 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393081#comment-15393081
 ] 

Manoj Govindassamy commented on HDFS-10689:
---

Whenever the leading bit in the numeric format is omitted, it is considered to 
be all 0s. Chmod man page clearly mentions on the expected behavior with 
permission apply via numeric option with respect to setuid and setgid bits. 

man chmod:
{noformat}
SETUID AND SETGID BITS
   chmod clears the set-group-ID bit of a regular file if the file’s group 
ID does not match the user’s effective group ID or one of  the  user’s  
supplementary  group
   IDs,  unless  the  user  has  appropriate privileges.  Additional 
restrictions may cause the set-user-ID and set-group-ID bits of MODE or RFILE 
to be ignored.  This
   behavior depends on the policy and functionality of the underlying chmod 
system call.  When in doubt, check the underlying system behavior.

   chmod preserves a directory’s set-user-ID and set-group-ID bits unless 
you explicitly specify otherwise.  *You can set or clear the bits with symbolic 
modes like u+s
   and g-s, and you can set (but not clear) the bits with a numeric mode.*
{noformat}

*That is, 755 will not reset setuid and setgid bits on the file/directory.* 
However, man page doesn't mention anything of that sort for sticky bits. So, I 
am leaning towards how other distros have implemented this. Here is the EXT4 
behavior.

{noformat}
-bash-4.1$ df -T
Filesystem Type  1K-blocks  Used Available Use% Mounted on
/dev/xvda1 ext4  103209948  91990624   5977880  94% /

-bash-4.1$ pwd
/home/manojg

drwxrwxr-x 2 manojg manojg 4096 Jul 25 19:09 dir_test_sticky_bit
-bash-4.1$ chmod 1775 dir_test_sticky_bit
-bash-4.1$ ls -l
total 4
drwxrwxr-t 2 manojg manojg 4096 Jul 25 19:09 dir_test_sticky_bit

-bash-4.1$ chmod 775 dir_test_sticky_bit
-bash-4.1$ ls -l
total 4
drwxrwxr-x 2 manojg manojg 4096 Jul 25 19:09 dir_test_sticky_bit  <=== 755 does 
clear out sticky bit
-bash-4.1$ .
{noformat}

So, EXT4 and many other filesystems on Linux, MacOSX are resetting Sticky Bit 
when the bit is not specified in the permission arg.


> "hdfs dfs -chmod 777" does not remove sticky bit
> 
>
> Key: HDFS-10689
> URL: https://issues.apache.org/jira/browse/HDFS-10689
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Minor
>
> When a directory permission is modified using hdfs dfs chmod command and when 
> octal/numeric format is used, the leading sticky bit is not fully honored.
> 1. Create a dir dir_test_with_sticky_bit
> 2. Apply sticky bit permission on the dir : hdfs dfs -chmod 1755 
> /dir_test_with_sticky_bit
> 3. Remove sticky bit permission on the dir: hdfs dfs -chmod 755 
> /dir_test_with_sticky_bit
> Expected: Remove the sticky bit on the dir, as it happens on Mac/Linux native 
> filesystem with native chmod.
> 4. However, removing sticky bit permission by explicitly turning off the bit 
> works. hdfs dfs -chmod 0755 /dir_test_with_sticky_bit
> {noformat}
> manoj@~/work/hadev-pp: hdfs dfs -chmod 1755 /dir_test_with_sticky_bit
> manoj@~/work/hadev-pp: hdfs dfs -ls /
> Found 2 items
> drwxr-xr-t   - manoj supergroup  0 2016-07-25 11:42 
> /dir_test_with_sticky_bit
> drwxr-xr-x   - manoj supergroup  0 2016-07-25 11:42 /user
> manoj@~/work/hadev-pp: hdfs dfs -chmod 755 /dir_test_with_sticky_bit
> manoj@~/work/hadev-pp: hdfs dfs -ls /
> Found 2 items
> drwxr-xr-t   - manoj supergroup  0 2016-07-25 11:42 
> /dir_test_with_sticky_bit  <=== sticky bit still intact
> drwxr-xr-x   - manoj supergroup  0 2016-07-25 11:42 /user
> manoj@~/work/hadev-pp: hdfs dfs -chmod 0755 /dir_test_with_sticky_bit
> manoj@~/work/hadev-pp: hdfs dfs -ls /
> Found 2 items
> drwxr-xr-x   - manoj supergroup  0 2016-07-25 11:42 
> /dir_test_with_sticky_bit
> drwxr-xr-x   - manoj supergroup  0 2016-07-25 11:42 /user
> manoj@~/work/hadev-pp: 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393073#comment-15393073
 ] 

Hudson commented on HDFS-10301:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #10148 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10148/])
HDFS-10301. Interleaving processing of storages from repeated block (shv: rev 
85a20508bd04851d47c24b7562ec2927d5403446)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java


> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10667) Report more accurate info about data corruption location

2016-07-25 Thread Yuanbo Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-10667:
--
Attachment: HDFS-10667.003.patch

> Report more accurate info about data corruption location
> 
>
> Key: HDFS-10667
> URL: https://issues.apache.org/jira/browse/HDFS-10667
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs
>Reporter: Yongjun Zhang
>Assignee: Yuanbo Liu
> Attachments: HDFS-10667.001.patch, HDFS-10667.002.patch, 
> HDFS-10667.003.patch
>
>
> Per 
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15376897&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15376897
> 129.77 report:
> {code}
> 2016-07-13 11:49:01,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving blk_1116167880_42906656 src: /10.6.134.229:43844 dest: 
> /10.6.129.77:5080
> 2016-07-13 11:49:01,543 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Checksum error in block blk_1116167880_42906656 from /10.6.134.229:43844
> org.apache.hadoop.fs.ChecksumException: Checksum error: 
> DFSClient_NONMAPREDUCE_2019484565_1 at 81920 exp: 1352119728 got: -1012279895
> at 
> org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native 
> Method)
> at 
> org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
> at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
> at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:421)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:558)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
> at java.lang.Thread.run(Thread.java:745)
> 2016-07-13 11:49:01,543 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exception for blk_1116167880_42906656
> java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
> Unexpected checksum mismatch while writing blk_1116167880_42906656 from 
> /10.6.134.229:43844
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:571)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> and
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15378879&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15378879
> {quote}
> While verifying only packet, the position mentioned in the checksum 
> exception, is relative to packet buffer offset, not the block offset. So 
> 81920 is the offset in the exception.
> {quote}
> Create this jira to report more accurate corruption location information: the 
> offset in the file, offset in block, and offset in packet.
> See 
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15387083&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15387083



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10620) StringBuilder created and appended even if logging is disabled

2016-07-25 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-10620:
-
Target Version/s: 2.8.0, 3.0.0-alpha2  (was: 3.0.0-alpha2)

> StringBuilder created and appended even if logging is disabled
> --
>
> Key: HDFS-10620
> URL: https://issues.apache.org/jira/browse/HDFS-10620
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.4
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10620-branch-2.01.patch, HDFS-10620.001.patch, 
> HDFS-10620.002.patch
>
>
> In BlockManager.addToInvalidates the StringBuilder is appended to during the 
> delete even if logging isn't active.
> Could avoid allocating the StringBuilder as well, but not sure if it is 
> really worth it to add null handling in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10689) "hdfs dfs -chmod 777" does not remove sticky bit

2016-07-25 Thread Manoj Govindassamy (JIRA)
Manoj Govindassamy created HDFS-10689:
-

 Summary: "hdfs dfs -chmod 777" does not remove sticky bit
 Key: HDFS-10689
 URL: https://issues.apache.org/jira/browse/HDFS-10689
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy
Priority: Minor



When a directory permission is modified using hdfs dfs chmod command and when 
octal/numeric format is used, the leading sticky bit is not fully honored.

1. Create a dir dir_test_with_sticky_bit
2. Apply sticky bit permission on the dir : hdfs dfs -chmod 1755 
/dir_test_with_sticky_bit
3. Remove sticky bit permission on the dir: hdfs dfs -chmod 755 
/dir_test_with_sticky_bit

Expected: Remove the sticky bit on the dir, as it happens on Mac/Linux native 
filesystem with native chmod.

4. However, removing sticky bit permission by explicitly turning off the bit 
works. hdfs dfs -chmod 0755 /dir_test_with_sticky_bit

{noformat}
manoj@~/work/hadev-pp: hdfs dfs -chmod 1755 /dir_test_with_sticky_bit
manoj@~/work/hadev-pp: hdfs dfs -ls /
Found 2 items
drwxr-xr-t   - manoj supergroup  0 2016-07-25 11:42 
/dir_test_with_sticky_bit
drwxr-xr-x   - manoj supergroup  0 2016-07-25 11:42 /user

manoj@~/work/hadev-pp: hdfs dfs -chmod 755 /dir_test_with_sticky_bit
manoj@~/work/hadev-pp: hdfs dfs -ls /
Found 2 items
drwxr-xr-t   - manoj supergroup  0 2016-07-25 11:42 
/dir_test_with_sticky_bit  <=== sticky bit still intact
drwxr-xr-x   - manoj supergroup  0 2016-07-25 11:42 /user

manoj@~/work/hadev-pp: hdfs dfs -chmod 0755 /dir_test_with_sticky_bit
manoj@~/work/hadev-pp: hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - manoj supergroup  0 2016-07-25 11:42 
/dir_test_with_sticky_bit
drwxr-xr-x   - manoj supergroup  0 2016-07-25 11:42 /user
manoj@~/work/hadev-pp: 
{noformat}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10667) Report more accurate info about data corruption location

2016-07-25 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393065#comment-15393065
 ] 

Yuanbo Liu commented on HDFS-10667:
---

[~yzhangal] Sure, I've uploaded v3 patch per your comment.

> Report more accurate info about data corruption location
> 
>
> Key: HDFS-10667
> URL: https://issues.apache.org/jira/browse/HDFS-10667
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs
>Reporter: Yongjun Zhang
>Assignee: Yuanbo Liu
> Attachments: HDFS-10667.001.patch, HDFS-10667.002.patch
>
>
> Per 
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15376897&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15376897
> 129.77 report:
> {code}
> 2016-07-13 11:49:01,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving blk_1116167880_42906656 src: /10.6.134.229:43844 dest: 
> /10.6.129.77:5080
> 2016-07-13 11:49:01,543 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Checksum error in block blk_1116167880_42906656 from /10.6.134.229:43844
> org.apache.hadoop.fs.ChecksumException: Checksum error: 
> DFSClient_NONMAPREDUCE_2019484565_1 at 81920 exp: 1352119728 got: -1012279895
> at 
> org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native 
> Method)
> at 
> org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
> at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
> at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:421)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:558)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
> at java.lang.Thread.run(Thread.java:745)
> 2016-07-13 11:49:01,543 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exception for blk_1116167880_42906656
> java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
> Unexpected checksum mismatch while writing blk_1116167880_42906656 from 
> /10.6.134.229:43844
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:571)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> and
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15378879&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15378879
> {quote}
> While verifying only packet, the position mentioned in the checksum 
> exception, is relative to packet buffer offset, not the block offset. So 
> 81920 is the offset in the exception.
> {quote}
> Create this jira to report more accurate corruption location information: the 
> offset in the file, offset in block, and offset in packet.
> See 
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15387083&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15387083



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10620) StringBuilder created and appended even if logging is disabled

2016-07-25 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-10620:
-
Status: Patch Available  (was: Reopened)

> StringBuilder created and appended even if logging is disabled
> --
>
> Key: HDFS-10620
> URL: https://issues.apache.org/jira/browse/HDFS-10620
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.4
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10620-branch-2.01.patch, HDFS-10620.001.patch, 
> HDFS-10620.002.patch
>
>
> In BlockManager.addToInvalidates the StringBuilder is appended to during the 
> delete even if logging isn't active.
> Could avoid allocating the StringBuilder as well, but not sure if it is 
> really worth it to add null handling in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10620) StringBuilder created and appended even if logging is disabled

2016-07-25 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-10620:
-
Attachment: HDFS-10620-branch-2.01.patch

> StringBuilder created and appended even if logging is disabled
> --
>
> Key: HDFS-10620
> URL: https://issues.apache.org/jira/browse/HDFS-10620
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.4
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10620-branch-2.01.patch, HDFS-10620.001.patch, 
> HDFS-10620.002.patch
>
>
> In BlockManager.addToInvalidates the StringBuilder is appended to during the 
> delete even if logging isn't active.
> Could avoid allocating the StringBuilder as well, but not sure if it is 
> really worth it to add null handling in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393060#comment-15393060
 ] 

Hadoop QA commented on HDFS-10301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-10301 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820078/HDFS-10301.branch-2.patch
 |
| JIRA Issue | HDFS-10301 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16182/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10620) StringBuilder created and appended even if logging is disabled

2016-07-25 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka reopened HDFS-10620:
--

> StringBuilder created and appended even if logging is disabled
> --
>
> Key: HDFS-10620
> URL: https://issues.apache.org/jira/browse/HDFS-10620
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.4
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10620.001.patch, HDFS-10620.002.patch
>
>
> In BlockManager.addToInvalidates the StringBuilder is appended to during the 
> delete even if logging isn't active.
> Could avoid allocating the StringBuilder as well, but not sure if it is 
> really worth it to add null handling in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10620) StringBuilder created and appended even if logging is disabled

2016-07-25 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393059#comment-15393059
 ] 

Akira Ajisaka commented on HDFS-10620:
--

Reverted. Thanks [~jzhuge]! I'll upload a patch for the backport.

> StringBuilder created and appended even if logging is disabled
> --
>
> Key: HDFS-10620
> URL: https://issues.apache.org/jira/browse/HDFS-10620
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.4
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10620.001.patch, HDFS-10620.002.patch
>
>
> In BlockManager.addToInvalidates the StringBuilder is appended to during the 
> delete even if logging isn't active.
> Could avoid allocating the StringBuilder as well, but not sure if it is 
> really worth it to add null handling in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10620) StringBuilder created and appended even if logging is disabled

2016-07-25 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-10620:
-
Fix Version/s: (was: 2.8.0)

> StringBuilder created and appended even if logging is disabled
> --
>
> Key: HDFS-10620
> URL: https://issues.apache.org/jira/browse/HDFS-10620
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.4
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10620.001.patch, HDFS-10620.002.patch
>
>
> In BlockManager.addToInvalidates the StringBuilder is appended to during the 
> delete even if logging isn't active.
> Could avoid allocating the StringBuilder as well, but not sure if it is 
> really worth it to add null handling in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-10301:
---
Attachment: HDFS-10301.branch-2.patch

I just committed this trunk. Congratulations [~redvine]!
Also ported to branch-2 and branch-2.8.
Will keep it open while a port to branch-2.7 / 6 is in the works.


> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10620) StringBuilder created and appended even if logging is disabled

2016-07-25 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393056#comment-15393056
 ] 

Akira Ajisaka commented on HDFS-10620:
--

Thank you for the comment. I'll revert this commit from branch-2 and branch-2.8 
shortly.

> StringBuilder created and appended even if logging is disabled
> --
>
> Key: HDFS-10620
> URL: https://issues.apache.org/jira/browse/HDFS-10620
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.4
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-10620.001.patch, HDFS-10620.002.patch
>
>
> In BlockManager.addToInvalidates the StringBuilder is appended to during the 
> delete even if logging isn't active.
> Could avoid allocating the StringBuilder as well, but not sure if it is 
> really worth it to add null handling in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10629) Federation Router

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393052#comment-15393052
 ] 

Hadoop QA commented on HDFS-10629:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
49s{color} | {color:green} HDFS-10467 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} HDFS-10467 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} HDFS-10467 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} HDFS-10467 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} HDFS-10467 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} HDFS-10467 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} HDFS-10467 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 52s{color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs generated 1 new + 36 unchanged - 
0 fixed = 37 total (was 36) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 201 new + 644 unchanged - 0 fixed = 845 total (was 644) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  
6s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 4 new + 0 
unchanged - 0 fixed = 4 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 50s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
42s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}114m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be 
atomic in 
org.apache.hadoop.hdfs.server.federation.router.ConnectionManager.getConnection(UserGroupInformation,
 String)  At ConnectionManager.java:may not be atomic in 
org.apache.hadoop.hdfs.server.federation.router.ConnectionManager.getConnection(UserGroupInformation,
 String)  At ConnectionManager.java:[line 240] |
|  |  Integer is incompatible with expected argument type String in 
org.apache.hadoop.hdfs.server.federation.router.ConnectionManager$ConnectionPool$CleanupTask.run()
  At ConnectionManager.java:argument type String in 
org.apache.hadoop.hdfs.server.federation.router.ConnectionManager$ConnectionPool$CleanupTask.run()
  At ConnectionManager.java:[line 176] |
|  |  
org.apache.hadoop.hdfs.server.federation.router.Router.initAndStartRouter(Configuration,
 boolean) invokes System.exit(...), which shuts down the entire virtual machine 
 At Router.java:shuts down the entire virtual machine  At Router.java:[line 
190] |
|  |  Redundant nullcheck of connection, which is known to be non-null in 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer$DefaultLookup.getClient()
  Redundant null check at RouterRpcServer.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServ

[jira] [Commented] (HDFS-10678) Documenting NNThroughputBenchmark tool

2016-07-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393048#comment-15393048
 ] 

Mingliang Liu commented on HDFS-10678:
--

Thanks for your prompt comment, [~shv].

Yes it's reasonable to do so as you suggested. After that we can extend the 
"Benchmark Tools" page by covering more tools.

> Documenting NNThroughputBenchmark tool
> --
>
> Key: HDFS-10678
> URL: https://issues.apache.org/jira/browse/HDFS-10678
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: benchmarks, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> The best (only) documentation for the NNThroughputBenchmark currently exists 
> as a JavaDoc on the NNThroughputBenchmark class. This is less than useful, 
> especially since we no longer generate javadocs for HDFS as part of the build 
> process. I suggest we extract it into a separate markdown doc, or merge it 
> with other benchmarking materials (if any?) about HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10620) StringBuilder created and appended even if logging is disabled

2016-07-25 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393046#comment-15393046
 ] 

John Zhuge commented on HDFS-10620:
---

Hi [~ajisakaa], when you merged the fix to branch-2 and branch-2.8, you 
misplaced "&&" with "&":
{code}
  private void addToInvalidates(Block b) {
...
if (datanodes != null & datanodes.length() != 0) {
  blockLog.debug("BLOCK* addToInvalidates: {} {}", b, datanodes);
}
  }
{code}
This resulted in {{TestFailoverWithBlockTokensEnabled}} test failures.

> StringBuilder created and appended even if logging is disabled
> --
>
> Key: HDFS-10620
> URL: https://issues.apache.org/jira/browse/HDFS-10620
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.4
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-10620.001.patch, HDFS-10620.002.patch
>
>
> In BlockManager.addToInvalidates the StringBuilder is appended to during the 
> delete even if logging isn't active.
> Could avoid allocating the StringBuilder as well, but not sure if it is 
> really worth it to add null handling in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10671) Fix typo in HdfsRollingUpgrade.md

2016-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393041#comment-15393041
 ] 

Hudson commented on HDFS-10671:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #10147 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10147/])
HDFS-10671. Fix typo in HdfsRollingUpgrade.md. Contributed by Yiqun Lin. 
(iwasakims: rev 59466b8c180716dda7aa670728580a88e54eb4d2)
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsRollingUpgrade.md


> Fix typo in HdfsRollingUpgrade.md
> -
>
> Key: HDFS-10671
> URL: https://issues.apache.org/jira/browse/HDFS-10671
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Trivial
> Attachments: HDFS-10671.001.patch
>
>
> In document {{HdfsRollingUpgrade.md}},
> {quote}
> The namenodes can be upgraded independent of datanods and journal nodes.
> {quote}
> Here {{datanods}} should be {{datanodes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393042#comment-15393042
 ] 

Hudson commented on HDFS-10688:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #10147 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10147/])
HDFS-10688. BPServiceActor may run into a tight loop for sending block (jing9: 
rev 0cde9e12a7175e4d8bc4ccd5c36055b280d1fbd6)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java


> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Fix For: 2.8.0
>
> Attachments: HDFS-10688.001.patch, HDFS-10688.002.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries per 
> second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10678) Documenting NNThroughputBenchmark tool

2016-07-25 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393030#comment-15393030
 ] 

Konstantin Shvachko commented on HDFS-10678:


Hey [~liuml07]. Documenting NNThroughputBenchmark is a good idea.
We should probably add it to Hadoop documentation, since JavaDocs are not 
published. We can add a new section "Benchmark Tools" on the documentation page:
http://hadoop.apache.org/docs/current/
For now it will document NNThroughputBenchmark. Later we can add docs for 
TestDFSIO, Slive.
Would it be reasonable to write a markdown doc based on existing JavaDoc?

> Documenting NNThroughputBenchmark tool
> --
>
> Key: HDFS-10678
> URL: https://issues.apache.org/jira/browse/HDFS-10678
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: benchmarks, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> The best (only) documentation for the NNThroughputBenchmark currently exists 
> as a JavaDoc on the NNThroughputBenchmark class. This is less than useful, 
> especially since we no longer generate javadocs for HDFS as part of the build 
> process. I suggest we extract it into a separate markdown doc, or merge it 
> with other benchmarking materials (if any?) about HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10688:
-
Description: 
Currently in BPServiceActor#offerService, when datanode runs into a local 
IOException, the DataNode only logs the exception and runs into the while loop 
again:
{code}
  } catch(RemoteException re) {
...
LOG.warn("RemoteException in offerService", re);
try {
  long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
  Thread.sleep(sleepTime);
} catch (InterruptedException ie) {
  Thread.currentThread().interrupt();
}
  } catch (IOException e) {
LOG.warn("IOException in offerService", e);
  }
{code}

This tight loop may cause some issue. For example, in a production cluster, we 
saw a DataNode hit exception when doing kerberos realm lookup. This tight loop 
finally caused the DataNode to send hundreds of DNS lookup queries per second.

  was:
Currently in BPServiceActor#offerService, when datanode runs into a local 
IOException, the DataNode only logs the exception and runs into the while loop 
again:
{code}
  } catch(RemoteException re) {
...
LOG.warn("RemoteException in offerService", re);
try {
  long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
  Thread.sleep(sleepTime);
} catch (InterruptedException ie) {
  Thread.currentThread().interrupt();
}
  } catch (IOException e) {
LOG.warn("IOException in offerService", e);
  }
{code}

This tight loop may cause some issue. For example, in a production cluster, we 
saw a DataNode hit exception when doing kerberos realm lookup. This tight loop 
finally caused the DataNode to send hundreds of DNS lookup queries.


> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Fix For: 2.8.0
>
> Attachments: HDFS-10688.001.patch, HDFS-10688.002.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries per 
> second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10519) Add a configuration option to enable in-progress edit log tailing

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393014#comment-15393014
 ] 

Hadoop QA commented on HDFS-10519:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
6s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 968 unchanged - 0 fixed = 970 total (was 968) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 23s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
18s{color} | {color:green} bkjournal in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 86m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSShell |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820056/HDFS-10519.011.patch |
| JIRA Issue | HDFS-10519 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  xml  |
| uname | Linux d135ec04e615 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d383bfd |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16179/artifact/patchprocess/diff-checkstyle-hadoop-hdf

[jira] [Updated] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10688:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk, branch-2 and branch-2.8. Thanks 
[~vagarychen] for the contribution!

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Fix For: 2.8.0
>
> Attachments: HDFS-10688.001.patch, HDFS-10688.002.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10671) Fix typo in HdfsRollingUpgrade.md

2016-07-25 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393012#comment-15393012
 ] 

Masatake Iwasaki commented on HDFS-10671:
-

+1, committing this.

> Fix typo in HdfsRollingUpgrade.md
> -
>
> Key: HDFS-10671
> URL: https://issues.apache.org/jira/browse/HDFS-10671
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Trivial
> Attachments: HDFS-10671.001.patch
>
>
> In document {{HdfsRollingUpgrade.md}},
> {quote}
> The namenodes can be upgraded independent of datanods and journal nodes.
> {quote}
> Here {{datanods}} should be {{datanodes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10677) Über-jira: Enhancements to NNThroughputBenchmark tool

2016-07-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10677:
-
Target Version/s: 2.8.0  (was: 3.0.0-alpha2)

> Über-jira: Enhancements to NNThroughputBenchmark tool
> -
>
> Key: HDFS-10677
> URL: https://issues.apache.org/jira/browse/HDFS-10677
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: benchmarks, tools
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10586) Erasure Code misfunctions when 3 DataNode down

2016-07-25 Thread gao shan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393010#comment-15393010
 ] 

gao shan commented on HDFS-10586:
-

The MR job failed and exited.

> Erasure Code misfunctions when 3 DataNode down
> --
>
> Key: HDFS-10586
> URL: https://issues.apache.org/jira/browse/HDFS-10586
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
> Environment: 9 DataNode and 1 NameNode,Erasured code policy is 
> set as "6--3",   When 3 DataNode down,  erasured code fails and an exception 
> is thrown
>Reporter: gao shan
>
> The following is the steps to reproduce:
> 1) hadoop fs -mkdir /ec
> 2) set erasured code policy as "6-3"
> 3) "write" data by : 
> time hadoop jar 
> /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar
>   TestDFSIO -D test.build.data=/ec -write -nrFiles 30 -fileSize 12288 
> -bufferSize 1073741824
> 4) Manually down 3 nodes.  Kill the threads of  "datanode" and "nodemanager" 
> in 3 DataNode.
> 5) By using erasured code to "read" data by:
> time hadoop jar 
> /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar
>   TestDFSIO -D test.build.data=/ec -read -nrFiles 30 -fileSize 12288 
> -bufferSize 1073741824
> then the failure occurs and the exception is thrown as:
> INFO mapreduce.Job: Task Id : attempt_1465445965249_0008_m_34_2, Status : 
> FAILED
> Error: java.io.IOException: 4 missing blocks, the stripe is: Offset=0, 
> length=8388608, fetchedChunksNum=0, missingChunksNum=4
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.checkMissingBlocks(DFSStripedInputStream.java:614)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readParityChunks(DFSStripedInputStream.java:647)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:762)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:316)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:450)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:941)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:531)
>   at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:508)
>   at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:134)
>   at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:37)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10642) Fix intermittently failing UT TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas

2016-07-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10642:
-
Attachment: HDFS-10642.001.patch

The v1 patch logs the exception while waiting for the storage type matching 
condition.

> Fix intermittently failing UT 
> TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas
> ---
>
> Key: HDFS-10642
> URL: https://issues.apache.org/jira/browse/HDFS-10642
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-10642.000.patch, HDFS-10642.001.patch
>
>
> See [example stack trace | 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/10001/testReport/org.apache.hadoop.hdfs.server.datanode.fsdataset.impl/TestLazyPersistReplicaRecovery/testDnRestartWithSavedReplicas/].
> h6. Error Message
> {code}
>  Expected: is 
>  but: was 
> {code}
> h6. Stacktrace
> {code}
> java.lang.AssertionError: 
> Expected: is 
>  but: was 
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.junit.Assert.assertThat(Assert.java:865)
>   at org.junit.Assert.assertThat(Assert.java:832)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:141)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:53)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10642) Fix intermittently failing UT TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas

2016-07-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10642:
-
Status: Patch Available  (was: Open)

> Fix intermittently failing UT 
> TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas
> ---
>
> Key: HDFS-10642
> URL: https://issues.apache.org/jira/browse/HDFS-10642
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-10642.000.patch
>
>
> See [example stack trace | 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/10001/testReport/org.apache.hadoop.hdfs.server.datanode.fsdataset.impl/TestLazyPersistReplicaRecovery/testDnRestartWithSavedReplicas/].
> h6. Error Message
> {code}
>  Expected: is 
>  but: was 
> {code}
> h6. Stacktrace
> {code}
> java.lang.AssertionError: 
> Expected: is 
>  but: was 
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.junit.Assert.assertThat(Assert.java:865)
>   at org.junit.Assert.assertThat(Assert.java:832)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:141)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:53)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10642) Fix intermittently failing UT TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas

2016-07-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10642:
-
Attachment: HDFS-10642.000.patch

The root cause of the test (and related ones) is that, the tests rely on either 
fixed sleep time, or {{triggerBlockReport}} before asserting the expected file 
replicas on storage type. However, the fixed time sleep may not be enough, or 
the block report is not yet fully processed.

The v0 patch is to retry for the condition (storage types match) using 
{{GenericTestUtils.waitFor()}}.

Ping [~arpitagarwal] and [~xyao] for review as you previously worked on the 
tests.

> Fix intermittently failing UT 
> TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas
> ---
>
> Key: HDFS-10642
> URL: https://issues.apache.org/jira/browse/HDFS-10642
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-10642.000.patch
>
>
> See [example stack trace | 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/10001/testReport/org.apache.hadoop.hdfs.server.datanode.fsdataset.impl/TestLazyPersistReplicaRecovery/testDnRestartWithSavedReplicas/].
> h6. Error Message
> {code}
>  Expected: is 
>  but: was 
> {code}
> h6. Stacktrace
> {code}
> java.lang.AssertionError: 
> Expected: is 
>  but: was 
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.junit.Assert.assertThat(Assert.java:865)
>   at org.junit.Assert.assertThat(Assert.java:832)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:141)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:53)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2016-07-25 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392984#comment-15392984
 ] 

Xiao Chen commented on HDFS-9276:
-

Thanks [~jzhuge] for revving.

As we talked offline, fixing the copy constructor to do deep-copying feels to 
be the reasonable thing to do here. And the change looks cleaner. :)

Nits:
- In {{Credentials#addToken}}, let's keep the nullity check on alias.
- We don't need the type cast to {{PrivateToken}} when checking 
{{token.getService().equals(alias)}}.

Also, I had a quick try locally, the added test case seems to be passing 
without any fix could you take a further look?

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, HDFS-9276.14.patch, 
> HDFS-9276.15.patch, HDFS-9276.16.patch, HDFS-9276.17.patch, 
> HDFS-9276.18.patch, HDFSReadLoop.scala, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>

[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Attachment: HDFS-10682.003.patch

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Status: In Progress  (was: Patch Available)

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Status: Patch Available  (was: In Progress)

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2016-07-25 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392940#comment-15392940
 ] 

John Zhuge commented on HDFS-9276:
--

Test error in {{TestBalancerWithSaslDataTransfer.testBalancer0Integrity}} not 
related.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, HDFS-9276.14.patch, 
> HDFS-9276.15.patch, HDFS-9276.16.patch, HDFS-9276.17.patch, 
> HDFS-9276.18.patch, HDFSReadLoop.scala, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(

[jira] [Commented] (HDFS-10676) Add namenode metric to measure time spent in generating EDEKs

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392936#comment-15392936
 ] 

Hadoop QA commented on HDFS-10676:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
56s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 27s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 9 new + 65 unchanged - 0 fixed = 74 total (was 65) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 37s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.server.balancer.TestBalancer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820033/HDFS-10676.001.patch |
| JIRA Issue | HDFS-10676 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 8e16969077e1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 703fdf8 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16177/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16177/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16177/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16177/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add namenode metric to measure time spent in generating EDEKs
> --

[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392929#comment-15392929
 ] 

Hadoop QA commented on HDFS-9276:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m  
8s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  5m 
32s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
43s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 11s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}130m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820037/HDFS-9276.18.patch |
| JIRA Issue | HDFS-9276 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 7b859c3b5844 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 703fdf8 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16176/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16176/testReport/ |
| modules | C: hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16176/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.a

[jira] [Comment Edited] (HDFS-10629) Federation Router

2016-07-25 Thread Jason Kace (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392884#comment-15392884
 ] 

Jason Kace edited comment on HDFS-10629 at 7/26/16 12:13 AM:
-

Updating Federation Router Patch:

1) Added basic unit tests for router service and RPC server NN commands.
2) Removed extra dependencies that are not critical to the basic router 
functionality (StateStore, HeartbeatService, SafemodeService).  I left the 
NamenodeStatusReport as a container to register NNs to facilitate unit testing, 
it isn't used by the router.
3) NN discovery/registration and file/subcluster locator is mocked in the test 
tree to enable unit tests.  These components are required for a functioning 
router and will be included in additional sub jiras.
4) Code cleanup, jenkins errors and commenting.

All comments and feedback are appreciated.


was (Author: jakace):
Updating Federation Router Patch:

1) Added basic unit tests for router service and RPC server NN commands.
2) Removed extra dependencies that are not critical to the basic router 
functionality (StateStore, HeartbeatService, SafemodeService). 
3) NN discovery/registration and file/subcluster locator is mocked in the test 
tree to enable unit tests.  These components are required for a functioning 
router and will be included in additional sub jiras.
4) Code cleanup, jenkins errors and commenting.

All comments and feedback are appreciated.

> Federation Router
> -
>
> Key: HDFS-10629
> URL: https://issues.apache.org/jira/browse/HDFS-10629
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Inigo Goiri
>Assignee: Jason Kace
> Attachments: HDFS-10629-HDFS-10467-002.patch, HDFS-10629.000.patch, 
> HDFS-10629.001.patch
>
>
> Component that routes calls from the clients to the right Namespace. It 
> implements {{ClientProtocol}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10519) Add a configuration option to enable in-progress edit log tailing

2016-07-25 Thread Jiayi Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Zhou updated HDFS-10519:
--
Attachment: HDFS-10519.011.patch

> Add a configuration option to enable in-progress edit log tailing
> -
>
> Key: HDFS-10519
> URL: https://issues.apache.org/jira/browse/HDFS-10519
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Reporter: Jiayi Zhou
>Assignee: Jiayi Zhou
>Priority: Minor
> Attachments: HDFS-10519.001.patch, HDFS-10519.002.patch, 
> HDFS-10519.003.patch, HDFS-10519.004.patch, HDFS-10519.005.patch, 
> HDFS-10519.006.patch, HDFS-10519.007.patch, HDFS-10519.008.patch, 
> HDFS-10519.009.patch, HDFS-10519.010.patch, HDFS-10519.011.patch
>
>
> Standby Namenode has the option to do in-progress edit log tailing to improve 
> the data freshness. In-progress tailing is already implemented, but it's not 
> enabled as default configuration. And there's no related configuration key to 
> turn it on.
> Adding a related configuration key to let Standby Namenode is reasonable and 
> would be a basis for further improvement on Standby Namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10629) Federation Router

2016-07-25 Thread Jason Kace (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392884#comment-15392884
 ] 

Jason Kace edited comment on HDFS-10629 at 7/26/16 12:07 AM:
-

Updating Federation Router Patch:

1) Added basic unit tests for router service and RPC server NN commands.
2) Removed extra dependencies that are not critical to the basic router 
functionality (StateStore, HeartbeatService, SafemodeService). 
3) NN discovery/registration and file/subcluster locator is mocked in the test 
tree to enable unit tests.  These components are required for a functioning 
router and will be included in additional sub jiras.
4) Code cleanup, jenkins errors and commenting.

All comments and feedback are appreciated.


was (Author: jakace):
Updating Federation Router Patch:

1) Added basic unit tests for router service and RPC server NN commands.
2) Removed extra dependencies that are not critical to the basic router 
functionality.  
3) NN discovery/registration and file/subcluster locator is mocked in the test 
tree to enable unit tests.  These components are required for a functioning 
router and will be included in additional sub jiras.
4) Code cleanup, jenkins errors and commenting.

All comments and feedback are appreciated.

> Federation Router
> -
>
> Key: HDFS-10629
> URL: https://issues.apache.org/jira/browse/HDFS-10629
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Inigo Goiri
>Assignee: Jason Kace
> Attachments: HDFS-10629-HDFS-10467-002.patch, HDFS-10629.000.patch, 
> HDFS-10629.001.patch
>
>
> Component that routes calls from the clients to the right Namespace. It 
> implements {{ClientProtocol}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10629) Federation Router

2016-07-25 Thread Jason Kace (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Kace updated HDFS-10629:
--
Attachment: HDFS-10629-HDFS-10467-002.patch

Updating Federation Router Patch:

1) Added basic unit tests for router service and RPC server NN commands.
2) Removed extra dependencies that are not critical to the basic router 
functionality.  
3) NN discovery/registration and file/subcluster locator is mocked in the test 
tree to enable unit tests.  These components are required for a functioning 
router and will be included in additional sub jiras.
4) Code cleanup, jenkins errors and commenting.

All comments and feedback are appreciated.

> Federation Router
> -
>
> Key: HDFS-10629
> URL: https://issues.apache.org/jira/browse/HDFS-10629
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Inigo Goiri
>Assignee: Jason Kace
> Attachments: HDFS-10629-HDFS-10467-002.patch, HDFS-10629.000.patch, 
> HDFS-10629.001.patch
>
>
> Component that routes calls from the clients to the right Namespace. It 
> implements {{ClientProtocol}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392883#comment-15392883
 ] 

Hadoop QA commented on HDFS-10682:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m  
4s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 188 new + 79 unchanged - 41 fixed = 267 total (was 120) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
55s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m  1s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Inconsistent synchronization of 
org.apache.hadoop.hdfs.LockChecker.warningsSuppressed; locked 50% of time  
Unsynchronized access at LockChecker.java:50% of time  Unsynchronized access at 
LockChecker.java:[line 150] |
|  |  Should org.apache.hadoop.hdfs.LockChecker$OperationLockInfomation be a 
_static_ inner class?  At LockChecker.java:inner class?  At 
LockChecker.java:[lines 67-82] |
| Failed junit tests | hadoop.hdfs.TestParallelShortCircuitReadUnCached |
|   | hadoop.hdfs.TestErasureCodeBenchmarkThroughput |
|   | hadoop.hdfs.TestParallelUnixDomainRead |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820027/HDFS-10682.002.patch |
| JIRA Issue | HDFS-10682 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4d097e8c7a30 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 703fdf8 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 

[jira] [Commented] (HDFS-10609) Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream applications

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392882#comment-15392882
 ] 

Hadoop QA commented on HDFS-10609:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
95 unchanged - 0 fixed = 96 total (was 95) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
0s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m  9s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 95m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestEncryptedTransfer |
|   | hadoop.hdfs.server.datanode.TestDataNodeMXBean |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820024/HDFS-10609.001.patch |
| JIRA Issue | HDFS-10609 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 82051ce2aa95 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 703fdf8 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16174/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16174/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/1

[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-25 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392868#comment-15392868
 ] 

Konstantin Shvachko commented on HDFS-10301:


{{TestWebHdfsTimeouts}} failure does not look to be related to the changes.
The last patch looks good.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10676) Add namenode metric to measure time spent in generating EDEKs

2016-07-25 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392851#comment-15392851
 ] 

Arpit Agarwal commented on HDFS-10676:
--

Thanks for submitting the patch [~hanishakoneru]. 

Two quick comments:
# It looks like we need separate metrics for generateEncryptedKey and 
warmUpEncryptedKeys.
# The metric should be a {{MutableQuantiles}} so we can get latency percentiles.

Also tagging [~xyao] who understands this part of the code much better.

> Add namenode metric to measure time spent in generating EDEKs
> -
>
> Key: HDFS-10676
> URL: https://issues.apache.org/jira/browse/HDFS-10676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>  Labels: metrics, namenode
> Attachments: HDFS-10676.000.patch, HDFS-10676.001.patch
>
>
> A metric to measure the time spent by Namenode in interacting with Key 
> Management System (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392856#comment-15392856
 ] 

Hadoop QA commented on HDFS-10688:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 
29s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 57m  
7s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820023/HDFS-10688.002.patch |
| JIRA Issue | HDFS-10688 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 9639cb67488c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 703fdf8 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16172/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16172/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.p

[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392845#comment-15392845
 ] 

Hadoop QA commented on HDFS-10301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 368 unchanged - 12 fixed = 368 total (was 380) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m  0s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12819238/HDFS-10301.012.patch |
| JIRA Issue | HDFS-10301 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b189d80c0730 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 703fdf8 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16171/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16171/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16171/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
>   

[jira] [Commented] (HDFS-10650) DFSClient#mkdirs and DFSClient#primitiveMkdir should use default directory permission

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392844#comment-15392844
 ] 

Hadoop QA commented on HDFS-10650:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 36m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
56s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 61m 
41s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12818945/HDFS-10650.002.patch |
| JIRA Issue | HDFS-10650 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2e801496eaf2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 86ae218 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16170/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client 
hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16170/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DFSClient#mkdirs and DFSClient#primitiveMkdir should use default directory 
> permission
> 

[jira] [Commented] (HDFS-10667) Report more accurate info about data corruption location

2016-07-25 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392784#comment-15392784
 ] 

Yongjun Zhang commented on HDFS-10667:
--

Hi [~yuanbo], one suggestion, when investigating an issue, we tend to grep 
block id in the log file, it would be easier to see this info is on the same 
line. So I suggest to drop the newlines ("\n"). 

Thanks.


> Report more accurate info about data corruption location
> 
>
> Key: HDFS-10667
> URL: https://issues.apache.org/jira/browse/HDFS-10667
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs
>Reporter: Yongjun Zhang
>Assignee: Yuanbo Liu
> Attachments: HDFS-10667.001.patch, HDFS-10667.002.patch
>
>
> Per 
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15376897&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15376897
> 129.77 report:
> {code}
> 2016-07-13 11:49:01,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving blk_1116167880_42906656 src: /10.6.134.229:43844 dest: 
> /10.6.129.77:5080
> 2016-07-13 11:49:01,543 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Checksum error in block blk_1116167880_42906656 from /10.6.134.229:43844
> org.apache.hadoop.fs.ChecksumException: Checksum error: 
> DFSClient_NONMAPREDUCE_2019484565_1 at 81920 exp: 1352119728 got: -1012279895
> at 
> org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native 
> Method)
> at 
> org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
> at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
> at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:421)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:558)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
> at java.lang.Thread.run(Thread.java:745)
> 2016-07-13 11:49:01,543 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exception for blk_1116167880_42906656
> java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
> Unexpected checksum mismatch while writing blk_1116167880_42906656 from 
> /10.6.134.229:43844
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:571)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> and
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15378879&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15378879
> {quote}
> While verifying only packet, the position mentioned in the checksum 
> exception, is relative to packet buffer offset, not the block offset. So 
> 81920 is the offset in the exception.
> {quote}
> Create this jira to report more accurate corruption location information: the 
> offset in the file, offset in block, and offset in packet.
> See 
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15387083&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15387083



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10519) Add a configuration option to enable in-progress edit log tailing

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392766#comment-15392766
 ] 

Hadoop QA commented on HDFS-10519:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-10519 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12819511/HDFS-10519.010.patch |
| JIRA Issue | HDFS-10519 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16175/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add a configuration option to enable in-progress edit log tailing
> -
>
> Key: HDFS-10519
> URL: https://issues.apache.org/jira/browse/HDFS-10519
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Reporter: Jiayi Zhou
>Assignee: Jiayi Zhou
>Priority: Minor
> Attachments: HDFS-10519.001.patch, HDFS-10519.002.patch, 
> HDFS-10519.003.patch, HDFS-10519.004.patch, HDFS-10519.005.patch, 
> HDFS-10519.006.patch, HDFS-10519.007.patch, HDFS-10519.008.patch, 
> HDFS-10519.009.patch, HDFS-10519.010.patch
>
>
> Standby Namenode has the option to do in-progress edit log tailing to improve 
> the data freshness. In-progress tailing is already implemented, but it's not 
> enabled as default configuration. And there's no related configuration key to 
> turn it on.
> Adding a related configuration key to let Standby Namenode is reasonable and 
> would be a basis for further improvement on Standby Namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10598) DiskBalancer does not execute multi-steps plan.

2016-07-25 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392760#comment-15392760
 ] 

Wei-Chiu Chuang commented on HDFS-10598:


Hi [~eddyxu] thanks identifying the bug and submitting the patch. I think the 
fix is straightforward and the unit test makes sense to me. I wonder if we need 
more unit tests to cover more scenarios, because in addition to the normal 
operation, the patch fixes the termination condition in these corner cases:
* {code}// Check for the max error count constraint.{code}
* {code}// we are not able to find any blocks to copy.{code}
* {code}// check if someone told us exit{code}
* {code}// Technically it is possible for us to find a smaller block and{code}

[~arpitagarwal], what's your take?

> DiskBalancer does not execute multi-steps plan.
> ---
>
> Key: HDFS-10598
> URL: https://issues.apache.org/jira/browse/HDFS-10598
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: diskbalancer
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Critical
> Attachments: HDFS-10598.00.patch
>
>
> I set up a 3 DN node cluster, each one with 2 small disks.  After creating 
> some files to fill HDFS, I added two more small disks to one DN.  And run the 
> diskbalancer on this DataNode.
> The disk usage before running diskbalancer:
> {code}
> /dev/loop0  3.9G  2.1G  1.6G 58%  /mnt/data1
> /dev/loop1  3.9G  2.6G  1.1G 71%  /mnt/data2
> /dev/loop2  3.9G  17M  3.6G 1%  /mnt/data3
> /dev/loop3  3.9G  17M  3.6G 1%  /mnt/data4
> {code}
> However, after running diskbalancer (i.e., {{-query}} shows {{PLAN_DONE}})
> {code}
> /dev/loop0  3.9G  1.2G  2.5G 32%  /mnt/data1
> /dev/loop1  3.9G  2.6G  1.1G 71%  /mnt/data2
> /dev/loop2  3.9G  953M  2.7G 26%  /mnt/data3
> /dev/loop3  3.9G  17M  3.6G 1%   /mnt/data4
> {code}
> It is suspicious that in {{DiskBalancerMover#copyBlocks}}, every return does 
> {{this.setExitFlag}} which prevents {{copyBlocks()}} be called multiple times 
> from {{DiskBalancer#executePlan}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2016-07-25 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-9276:
-
Attachment: HDFS-9276.18.patch

Patch 18:
* Change {{Token}} copy constructor to do deep copies.
* No need for the extra field {{PrivateToken#publicService}}, thus no change to 
{{PrivateToken}} at all.
* Change {{Credentials.addToken}} to use {{getService()}}.


> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, HDFS-9276.14.patch, 
> HDFS-9276.15.patch, HDFS-9276.16.patch, HDFS-9276.17.patch, 
> HDFS-9276.18.patch, HDFSReadLoop.scala, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.

[jira] [Updated] (HDFS-10676) Add namenode metric to measure time spent in generating EDEKs

2016-07-25 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-10676:
--
Status: Patch Available  (was: In Progress)

> Add namenode metric to measure time spent in generating EDEKs
> -
>
> Key: HDFS-10676
> URL: https://issues.apache.org/jira/browse/HDFS-10676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>  Labels: metrics, namenode
> Attachments: HDFS-10676.000.patch, HDFS-10676.001.patch
>
>
> A metric to measure the time spent by Namenode in interacting with Key 
> Management System (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10676) Add namenode metric to measure time spent in generating EDEKs

2016-07-25 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-10676:
--
Attachment: HDFS-10676.001.patch

> Add namenode metric to measure time spent in generating EDEKs
> -
>
> Key: HDFS-10676
> URL: https://issues.apache.org/jira/browse/HDFS-10676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>  Labels: metrics, namenode
> Attachments: HDFS-10676.000.patch, HDFS-10676.001.patch
>
>
> A metric to measure the time spent by Namenode in interacting with Key 
> Management System (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10681) DiskBalancer: query command should report Plan file path apart from PlanID

2016-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392701#comment-15392701
 ] 

Hadoop QA commented on HDFS-10681:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
1s{color} | {color:red} Docker failed to build yetus/hadoop:85209cc. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12819760/HDFS-10681-HDFS-1312.001.patch
 |
| JIRA Issue | HDFS-10681 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16169/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DiskBalancer: query command should report Plan file path apart from PlanID
> --
>
> Key: HDFS-10681
> URL: https://issues.apache.org/jira/browse/HDFS-10681
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: diskbalancer
>Affects Versions: HDFS-1312
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HDFS-10681-HDFS-1312.001.patch
>
>
> DiskBalancer query command currently reports planID (SHA512 hex) only. 
> Currently ongoing disk balancing activity in a datanode can be cancelled 
> wither by planID + datanode_address or just by pointing to the right plan 
> file. Since there could be many plan files, to avoid ambiguity its better if 
> query command can report the plan file path also.
> {noformat}
> $ hdfs diskbalancer --help query 
> usage: hdfs diskbalancer -query   [options]
> Query Plan queries a given data node about the current state of disk
> balancer execution.
> --queryQueries the disk balancer status of a given datanode.
> Query command retrievs *the plan ID* and the current running state.
> {noformat}
> Sample query command output:
> {noformat}
> 16/06/20 15:42:16 INFO command.Command: Executing "query plan" command.
> Plan ID: 
> 04f41e2e1fa2d63558284be85155ea68154fb6ab435f1078c642d605d06626f176da16b321b35c99f1f6cd0cd77090c8743bb9a19190c4a01b5f8c51a515e240
>  Result: PLAN_UNDER_PROGRESS
> or
> 16/06/20 15:46:09 INFO command.Command: Executing "query plan" command.
> Plan ID: 
> 04f41e2e1fa2d63558284be85155ea68154fb6ab435f1078c642d605d06626f176da16b321b35c99f1f6cd0cd77090c8743bb9a19190c4a01b5f8c51a515e240
>  Result: PLAN_DONE
> {noformat}
> Cancel command syntax:
> {noformat}
> $ hdfs diskbalancer --help cancel
> *usage: hdfs diskbalancer -cancel  | -cancel  -node
> *
> Cancel command cancels a running disk balancer operation.
> --cancelCancels a running plan using a plan file.
> --node  Cancels a running plan using a plan ID and hostName
> Cancel command can be run via pointing to a plan file, or by reading the
> plan ID using the query command and then using planID and hostname.
> Examples of how to run this command are
> hdfs diskbalancer -cancel 
> hdfs diskbalancer -cancel  -node 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10650) DFSClient#mkdirs and DFSClient#primitiveMkdir should use default directory permission

2016-07-25 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392670#comment-15392670
 ] 

Xiao Chen commented on HDFS-10650:
--

Thanks for the comments, Chris and John. I think patch 2 is good.

Also went through places of {{applyUmask}} invocations, and the comments in 
HADOOP-9155, didn't see any problems. But this just feels like a huge change in 
behavior, so will let it sink for a couple of days.

Kicked a new jenkins run.

> DFSClient#mkdirs and DFSClient#primitiveMkdir should use default directory 
> permission
> -
>
> Key: HDFS-10650
> URL: https://issues.apache.org/jira/browse/HDFS-10650
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
> Attachments: HDFS-10650.001.patch, HDFS-10650.002.patch
>
>
> These 2 DFSClient methods should use default directory permission to create a 
> directory.
> {code:java}
>   public boolean mkdirs(String src, FsPermission permission,
>   boolean createParent) throws IOException {
> if (permission == null) {
>   permission = FsPermission.getDefault();
> }
> {code}
> {code:java}
>   public boolean primitiveMkdir(String src, FsPermission absPermission, 
> boolean createParent)
> throws IOException {
> checkOpen();
> if (absPermission == null) {
>   absPermission = 
> FsPermission.getDefault().applyUMask(dfsClientConf.uMask);
> } 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10678) Documenting NNThroughputBenchmark tool

2016-07-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392661#comment-15392661
 ] 

Mingliang Liu commented on HDFS-10678:
--

Ping [~shv] for the proposal.

> Documenting NNThroughputBenchmark tool
> --
>
> Key: HDFS-10678
> URL: https://issues.apache.org/jira/browse/HDFS-10678
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: benchmarks, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> The best (only) documentation for the NNThroughputBenchmark currently exists 
> as a JavaDoc on the NNThroughputBenchmark class. This is less than useful, 
> especially since we no longer generate javadocs for HDFS as part of the build 
> process. I suggest we extract it into a separate markdown doc, or merge it 
> with other benchmarking materials (if any?) about HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392648#comment-15392648
 ] 

Chen Liang commented on HDFS-10682:
---

Removed an unnecessary instrumentation call

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10642) Fix intermittently failing UT TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas

2016-07-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10642:
-
Affects Version/s: (was: 3.0.0-alpha2)
 Target Version/s: 2.8.0

> Fix intermittently failing UT 
> TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas
> ---
>
> Key: HDFS-10642
> URL: https://issues.apache.org/jira/browse/HDFS-10642
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> See [example stack trace | 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/10001/testReport/org.apache.hadoop.hdfs.server.datanode.fsdataset.impl/TestLazyPersistReplicaRecovery/testDnRestartWithSavedReplicas/].
> h6. Error Message
> {code}
>  Expected: is 
>  but: was 
> {code}
> h6. Stacktrace
> {code}
> java.lang.AssertionError: 
> Expected: is 
>  but: was 
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.junit.Assert.assertThat(Assert.java:865)
>   at org.junit.Assert.assertThat(Assert.java:832)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:141)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:53)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10668) Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount

2016-07-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10668:
-
Target Version/s: 2.8.0

> Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount
> -
>
> Key: HDFS-10668
> URL: https://issues.apache.org/jira/browse/HDFS-10668
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-10668.000.patch
>
>
> h6.Error Message
> {code}
> After delete one file expected:<4> but was:<5>
> {code}
> h6. Stacktrace
> {code}
> java.lang.AssertionError: After delete one file expected:<4> but was:<5>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean.testDataNodeMXBeanBlockCount(TestDataNodeMXBean.java:124)
> {code}
> Sample failing Jenkins pre-commit built, see 
> [here|https://builds.apache.org/job/PreCommit-HDFS-Build/16094/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMXBean/testDataNodeMXBeanBlockCount/].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10668) Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount

2016-07-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392656#comment-15392656
 ] 

Mingliang Liu commented on HDFS-10668:
--

Thanks [~brahmareddy] for your review.

> Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount
> -
>
> Key: HDFS-10668
> URL: https://issues.apache.org/jira/browse/HDFS-10668
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-10668.000.patch
>
>
> h6.Error Message
> {code}
> After delete one file expected:<4> but was:<5>
> {code}
> h6. Stacktrace
> {code}
> java.lang.AssertionError: After delete one file expected:<4> but was:<5>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean.testDataNodeMXBeanBlockCount(TestDataNodeMXBean.java:124)
> {code}
> Sample failing Jenkins pre-commit built, see 
> [here|https://builds.apache.org/job/PreCommit-HDFS-Build/16094/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMXBean/testDataNodeMXBeanBlockCount/].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Status: Patch Available  (was: In Progress)

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Attachment: HDFS-10682.002.patch

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Status: In Progress  (was: Patch Available)

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10688:
--
Status: Patch Available  (was: In Progress)

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.patch, HDFS-10688.002.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10609) Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream applications

2016-07-25 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10609:
---
Status: Patch Available  (was: Open)

> Uncaught InvalidEncryptionKeyException during pipeline recovery may abort 
> downstream applications
> -
>
> Key: HDFS-10609
> URL: https://issues.apache.org/jira/browse/HDFS-10609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.6.0
> Environment: CDH5.8.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10609.001.patch
>
>
> In normal operations, if SASL negotiation fails due to 
> {{InvalidEncryptionKeyException}}, it is typically a benign exception, which 
> is caught and retried :
> {code:title=SaslDataTransferServer#doSaslHandshake}
>   if (ioe instanceof SaslException &&
>   ioe.getCause() != null &&
>   ioe.getCause() instanceof InvalidEncryptionKeyException) {
> // This could just be because the client is long-lived and hasn't gotten
> // a new encryption key from the NN in a while. Upon receiving this
> // error, the client will get a new encryption key from the NN and retry
> // connecting to this DN.
> sendInvalidKeySaslErrorMessage(out, ioe.getCause().getMessage());
>   } 
> {code}
> {code:title=DFSOutputStream.DataStreamer#createBlockOutputStream}
> if (ie instanceof InvalidEncryptionKeyException && refetchEncryptionKey > 0) {
> DFSClient.LOG.info("Will fetch a new encryption key and retry, " 
> + "encryption key was invalid when connecting to "
> + nodes[0] + " : " + ie);
> {code}
> However, if the exception is thrown during pipeline recovery, the 
> corresponding code does not handle it properly, and the exception is spilled 
> out to downstream applications, such as SOLR, aborting its operation:
> {quote}
> 2016-07-06 12:12:51,992 ERROR org.apache.solr.update.HdfsTransactionLog: 
> Exception closing tlog.
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=557709482) doesn't exist. Current key: 1350592619
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1308)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1272)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1433)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1147)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:632)
> 2016-07-06 12:12:51,997 ERROR org.apache.solr.update.CommitTracker: auto 
> commit error...:org.apache.solr.common.SolrException: 
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=557709482) doesn't exist. Current key: 1350592619
> at 
> org.apache.solr.update.HdfsTransactionLog.close(HdfsTransactionLog.java:316)
> at 
> org.apache.solr.update.TransactionLog.decref(TransactionLog.java:505)
> at org.apache.solr.update.UpdateLog.addOldLog(UpdateLog.java:380)
> at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:676)
> at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:623)
> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)

[jira] [Updated] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10688:
--
Status: In Progress  (was: Patch Available)

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.patch, HDFS-10688.002.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392608#comment-15392608
 ] 

Chen Liang commented on HDFS-10688:
---

Thanks for the review and suggestion! Just submitted an updated patch

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.patch, HDFS-10688.002.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10609) Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream applications

2016-07-25 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10609:
---
Attachment: HDFS-10609.001.patch

Patch v01. A simple fix and a test case.

The test case creates a scenario where a file is present on the cluster, and 
the cluster's BlockTokenSecretManager is manipulated such that the block's 
token expires after a short duration. After sleeping for 15 seconds, shutdown 
the datanode, and the client write to the file again to induce pipeline 
recovery.

Note that because the client retries block transfer three times, there are two 
possible exceptions without the fix:
* InvalidEncryptionKeyException because the token expires
* IOException "Failed to replace a bad datanode on the existing pipeline due to 
no more good datanodes being available to try" because the cluster has only 4 
datanodes, and after the first attempt fails due to 
InvalidEncryptionKeyException which excludes one datanode, the subsequent 
attempt will see this exception.

> Uncaught InvalidEncryptionKeyException during pipeline recovery may abort 
> downstream applications
> -
>
> Key: HDFS-10609
> URL: https://issues.apache.org/jira/browse/HDFS-10609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.6.0
> Environment: CDH5.8.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10609.001.patch
>
>
> In normal operations, if SASL negotiation fails due to 
> {{InvalidEncryptionKeyException}}, it is typically a benign exception, which 
> is caught and retried :
> {code:title=SaslDataTransferServer#doSaslHandshake}
>   if (ioe instanceof SaslException &&
>   ioe.getCause() != null &&
>   ioe.getCause() instanceof InvalidEncryptionKeyException) {
> // This could just be because the client is long-lived and hasn't gotten
> // a new encryption key from the NN in a while. Upon receiving this
> // error, the client will get a new encryption key from the NN and retry
> // connecting to this DN.
> sendInvalidKeySaslErrorMessage(out, ioe.getCause().getMessage());
>   } 
> {code}
> {code:title=DFSOutputStream.DataStreamer#createBlockOutputStream}
> if (ie instanceof InvalidEncryptionKeyException && refetchEncryptionKey > 0) {
> DFSClient.LOG.info("Will fetch a new encryption key and retry, " 
> + "encryption key was invalid when connecting to "
> + nodes[0] + " : " + ie);
> {code}
> However, if the exception is thrown during pipeline recovery, the 
> corresponding code does not handle it properly, and the exception is spilled 
> out to downstream applications, such as SOLR, aborting its operation:
> {quote}
> 2016-07-06 12:12:51,992 ERROR org.apache.solr.update.HdfsTransactionLog: 
> Exception closing tlog.
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=557709482) doesn't exist. Current key: 1350592619
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1308)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1272)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1433)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1147)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:632)
> 2016-07-06 12:12:51,997 ERROR org.apache.solr.update.CommitTracker: auto 
> commit error...:org.apache.solr.common.SolrException: 
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyI

[jira] [Updated] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10688:
--
Status: Patch Available  (was: In Progress)

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.patch, HDFS-10688.002.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10688:
--
Attachment: HDFS-10688.002.patch

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.patch, HDFS-10688.002.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10688:
--
Status: In Progress  (was: Patch Available)

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.patch, HDFS-10688.002.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-10301:
---
Status: Patch Available  (was: Open)

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Attachment: (was: HDFS-10682.002.patch)

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Comment: was deleted

(was: removed an unnecessary instrumentation call.)

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392587#comment-15392587
 ] 

Jing Zhao commented on HDFS-10688:
--

Thanks for the fix, [~vagarychen]! Maybe we can rename the new {{sleep}} method 
to a more specific name like {{sleepAfterException}}? Other than this +1.

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Status: Patch Available  (was: In Progress)

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-10682 started by Chen Liang.
-
> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392583#comment-15392583
 ] 

Chen Liang commented on HDFS-10682:
---

removed an unnecessary instrumentation call.

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Attachment: HDFS-10682.002.patch

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10688:
-
Status: Patch Available  (was: Open)

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10684) WebHDFS calls fail when boolean parameters not provided

2016-07-25 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392575#comment-15392575
 ] 

John Zhuge commented on HDFS-10684:
---

[~loungerdork], I was not able to reproduce on 2.7.1 pseudo cluster, trunk 
pseudo cluster, and 2.6.0 cluster. Is it possible that your DataNode 
{{hadoop-datanode1}} is running a newer version than NameNode 
{{hadoop-primarynamenode}}?

> WebHDFS calls fail when boolean parameters not provided
> ---
>
> Key: HDFS-10684
> URL: https://issues.apache.org/jira/browse/HDFS-10684
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.1
>Reporter: Samuel Low
>Assignee: John Zhuge
>
> Optional boolean parameters that are not provided in the URL cause the 
> WebHDFS create file command to fail.
> curl -i -X PUT 
> "http://hadoop-primarynamenode:50070/webhdfs/v1/tmp/test1234?op=CREATE&overwrite=false";
> Response:
> HTTP/1.1 307 TEMPORARY_REDIRECT
> Cache-Control: no-cache
> Expires: Fri, 15 Jul 2016 04:10:13 GMT
> Date: Fri, 15 Jul 2016 04:10:13 GMT
> Pragma: no-cache
> Expires: Fri, 15 Jul 2016 04:10:13 GMT
> Date: Fri, 15 Jul 2016 04:10:13 GMT
> Pragma: no-cache
> Content-Type: application/octet-stream
> Location: 
> http://hadoop-datanode1:50075/webhdfs/v1/tmp/test1234?op=CREATE&namenoderpcaddress=hadoop-primarynamenode:8020&overwrite=false
> Content-Length: 0
> Server: Jetty(6.1.26)
> Following the redirect:
> curl -i -X PUT -T MYFILE 
> "http://hadoop-datanode1:50075/webhdfs/v1/tmp/test1234?op=CREATE&namenoderpcaddress=hadoop-primarynamenode:8020&overwrite=false";
> Response:
> HTTP/1.1 100 Continue
> HTTP/1.1 400 Bad Request
> Content-Type: application/json; charset=utf-8
> Content-Length: 162
> Connection: close
> 
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"Failed
>  to parse \"null\" to Boolean."}}
> The problem can be circumvented by providing both "createparent" and 
> "overwrite" parameters.
> However, this is not possible when I have no control over the WebHDFS calls, 
> e.g. Ambari and Hue have errors due to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10676) Add namenode metric to measure time spent in generating EDEKs

2016-07-25 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-10676:
--
Status: In Progress  (was: Patch Available)

> Add namenode metric to measure time spent in generating EDEKs
> -
>
> Key: HDFS-10676
> URL: https://issues.apache.org/jira/browse/HDFS-10676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>  Labels: metrics, namenode
> Attachments: HDFS-10676.000.patch
>
>
> A metric to measure the time spent by Namenode in interacting with Key 
> Management System (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10684) WebHDFS calls fail when boolean parameters not provided

2016-07-25 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-10684:
--
Labels:   (was: newbie)

> WebHDFS calls fail when boolean parameters not provided
> ---
>
> Key: HDFS-10684
> URL: https://issues.apache.org/jira/browse/HDFS-10684
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.1
>Reporter: Samuel Low
>
> Optional boolean parameters that are not provided in the URL cause the 
> WebHDFS create file command to fail.
> curl -i -X PUT 
> "http://hadoop-primarynamenode:50070/webhdfs/v1/tmp/test1234?op=CREATE&overwrite=false";
> Response:
> HTTP/1.1 307 TEMPORARY_REDIRECT
> Cache-Control: no-cache
> Expires: Fri, 15 Jul 2016 04:10:13 GMT
> Date: Fri, 15 Jul 2016 04:10:13 GMT
> Pragma: no-cache
> Expires: Fri, 15 Jul 2016 04:10:13 GMT
> Date: Fri, 15 Jul 2016 04:10:13 GMT
> Pragma: no-cache
> Content-Type: application/octet-stream
> Location: 
> http://hadoop-datanode1:50075/webhdfs/v1/tmp/test1234?op=CREATE&namenoderpcaddress=hadoop-primarynamenode:8020&overwrite=false
> Content-Length: 0
> Server: Jetty(6.1.26)
> Following the redirect:
> curl -i -X PUT -T MYFILE 
> "http://hadoop-datanode1:50075/webhdfs/v1/tmp/test1234?op=CREATE&namenoderpcaddress=hadoop-primarynamenode:8020&overwrite=false";
> Response:
> HTTP/1.1 100 Continue
> HTTP/1.1 400 Bad Request
> Content-Type: application/json; charset=utf-8
> Content-Length: 162
> Connection: close
> 
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"Failed
>  to parse \"null\" to Boolean."}}
> The problem can be circumvented by providing both "createparent" and 
> "overwrite" parameters.
> However, this is not possible when I have no control over the WebHDFS calls, 
> e.g. Ambari and Hue have errors due to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10684) WebHDFS calls fail when boolean parameters not provided

2016-07-25 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge reassigned HDFS-10684:
-

Assignee: John Zhuge

> WebHDFS calls fail when boolean parameters not provided
> ---
>
> Key: HDFS-10684
> URL: https://issues.apache.org/jira/browse/HDFS-10684
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.1
>Reporter: Samuel Low
>Assignee: John Zhuge
>
> Optional boolean parameters that are not provided in the URL cause the 
> WebHDFS create file command to fail.
> curl -i -X PUT 
> "http://hadoop-primarynamenode:50070/webhdfs/v1/tmp/test1234?op=CREATE&overwrite=false";
> Response:
> HTTP/1.1 307 TEMPORARY_REDIRECT
> Cache-Control: no-cache
> Expires: Fri, 15 Jul 2016 04:10:13 GMT
> Date: Fri, 15 Jul 2016 04:10:13 GMT
> Pragma: no-cache
> Expires: Fri, 15 Jul 2016 04:10:13 GMT
> Date: Fri, 15 Jul 2016 04:10:13 GMT
> Pragma: no-cache
> Content-Type: application/octet-stream
> Location: 
> http://hadoop-datanode1:50075/webhdfs/v1/tmp/test1234?op=CREATE&namenoderpcaddress=hadoop-primarynamenode:8020&overwrite=false
> Content-Length: 0
> Server: Jetty(6.1.26)
> Following the redirect:
> curl -i -X PUT -T MYFILE 
> "http://hadoop-datanode1:50075/webhdfs/v1/tmp/test1234?op=CREATE&namenoderpcaddress=hadoop-primarynamenode:8020&overwrite=false";
> Response:
> HTTP/1.1 100 Continue
> HTTP/1.1 400 Bad Request
> Content-Type: application/json; charset=utf-8
> Content-Length: 162
> Connection: close
> 
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"Failed
>  to parse \"null\" to Boolean."}}
> The problem can be circumvented by providing both "createparent" and 
> "overwrite" parameters.
> However, this is not possible when I have no control over the WebHDFS calls, 
> e.g. Ambari and Hue have errors due to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392455#comment-15392455
 ] 

Chen Liang commented on HDFS-10688:
---

Added a simple patch to fix this by sleeping for a while before re-entering the 
loop.

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10688:
--
Attachment: HDFS-10688.001.patch

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
> Attachments: HDFS-10688.001.patch
>
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392418#comment-15392418
 ] 

Arpit Agarwal commented on HDFS-10688:
--

Nice find [~jingzhao]!

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10688) BPServiceActor may run into a tight loop for sending block report when hitting IOException

2016-07-25 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392376#comment-15392376
 ] 

Jing Zhao edited comment on HDFS-10688 at 7/25/16 5:54 PM:
---

We can follow the same sleep logic as RemoteException when handling IOException.


was (Author: jingzhao):
We can following the same sleep logic as RemoteException when handling 
IOException.

> BPServiceActor may run into a tight loop for sending block report when 
> hitting IOException
> --
>
> Key: HDFS-10688
> URL: https://issues.apache.org/jira/browse/HDFS-10688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Jing Zhao
>Assignee: Chen Liang
>
> Currently in BPServiceActor#offerService, when datanode runs into a local 
> IOException, the DataNode only logs the exception and runs into the while 
> loop again:
> {code}
>   } catch(RemoteException re) {
> ...
> LOG.warn("RemoteException in offerService", re);
> try {
>   long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
>   Thread.sleep(sleepTime);
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
>   } catch (IOException e) {
> LOG.warn("IOException in offerService", e);
>   }
> {code}
> This tight loop may cause some issue. For example, in a production cluster, 
> we saw a DataNode hit exception when doing kerberos realm lookup. This tight 
> loop finally caused the DataNode to send hundreds of DNS lookup queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >