[jira] [Updated] (HDFS-10820) DataStreamer#Responder close fails lead the error recovery

2016-08-30 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10820:
-
Priority: Minor  (was: Major)

> DataStreamer#Responder close fails lead the error recovery
> --
>
> Key: HDFS-10820
> URL: https://issues.apache.org/jira/browse/HDFS-10820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-10820.001.patch, HDFS-10820.002.patch
>
>
> When worked for HDFS-6532, I found that the error recovery will happens when 
> responder close fails. The related codes:
> {code}
> {code:title=DataStreamer.java|borderStyle=solid}
>   public void run() {
> long lastPacket = Time.monotonicNow();
> TraceScope scope = null;
> while (!streamerClosed && dfsClient.clientRunning) {
>   // if the Responder encountered an error, shutdown Responder
>   if (errorState.hasError() && response != null) {
> try {
>   response.close();
>   response.join();
>   response = null;
> } catch (InterruptedException e) {
>   // If interruptedException happens, the response will not be set to 
> null
>   // and that will lead the error recovry.
>   LOG.warn("Caught exception", e);
> }
>   }
>   // Here need add a finally block to set response as null
>   ...
> {code}
> Can see the related 
> comment:https://issues.apache.org/jira/browse/HDFS-6532?focusedCommentId=15448770=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15448770



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10820) DataStreamer#Responder close fails lead the error recovery

2016-08-30 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10820:
-
Attachment: HDFS-10820.002.patch

Good catch, thanks [~xiaochen] for the quick review! Attach a new patch for 
addressing your comment.

> DataStreamer#Responder close fails lead the error recovery
> --
>
> Key: HDFS-10820
> URL: https://issues.apache.org/jira/browse/HDFS-10820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10820.001.patch, HDFS-10820.002.patch
>
>
> When worked for HDFS-6532, I found that the error recovery will happens when 
> responder close fails. The related codes:
> {code}
> {code:title=DataStreamer.java|borderStyle=solid}
>   public void run() {
> long lastPacket = Time.monotonicNow();
> TraceScope scope = null;
> while (!streamerClosed && dfsClient.clientRunning) {
>   // if the Responder encountered an error, shutdown Responder
>   if (errorState.hasError() && response != null) {
> try {
>   response.close();
>   response.join();
>   response = null;
> } catch (InterruptedException e) {
>   // If interruptedException happens, the response will not be set to 
> null
>   // and that will lead the error recovry.
>   LOG.warn("Caught exception", e);
> }
>   }
>   // Here need add a finally block to set response as null
>   ...
> {code}
> Can see the related 
> comment:https://issues.apache.org/jira/browse/HDFS-6532?focusedCommentId=15448770=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15448770



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451232#comment-15451232
 ] 

Hadoop QA commented on HDFS-9668:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-9668 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-9668 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825945/HDFS-9668-5.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16589/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-30 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-9668:

Fix Version/s: 3.0.0-alpha2
   Status: Patch Available  (was: Open)

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, 

[jira] [Updated] (HDFS-10655) Fix path related byte array conversion bugs

2016-08-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-10655:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed to branch-2.7.

> Fix path related byte array conversion bugs
> ---
>
> Key: HDFS-10655
> URL: https://issues.apache.org/jira/browse/HDFS-10655
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-10655-branch-2.7.patch, HDFS-10655.patch, 
> HDFS-10655.patch
>
>
> {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple 
> separators, nor does it handle relative paths correctly.
> {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless 
> the specified range is the entire component array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10655) Fix path related byte array conversion bugs

2016-08-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-10655:
-
Fix Version/s: 2.7.4

> Fix path related byte array conversion bugs
> ---
>
> Key: HDFS-10655
> URL: https://issues.apache.org/jira/browse/HDFS-10655
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: HDFS-10655-branch-2.7.patch, HDFS-10655.patch, 
> HDFS-10655.patch
>
>
> {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple 
> separators, nor does it handle relative paths correctly.
> {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless 
> the specified range is the entire component array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10820) DataStreamer#Responder close fails lead the error recovery

2016-08-30 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451131#comment-15451131
 ] 

Xiao Chen commented on HDFS-10820:
--

This is a trivial refactor, so no test needed. Error from checkstyle already 
exists, and is only improved by this.

Only a super trivial comment, since we're checking nullity in 
{{closeResponder}}, we could remove that from the if clause 
{{errorState.hasError() && response != null}}. +1 pending this.

Thanks for working on this, [~linyiqun].

> DataStreamer#Responder close fails lead the error recovery
> --
>
> Key: HDFS-10820
> URL: https://issues.apache.org/jira/browse/HDFS-10820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10820.001.patch
>
>
> When worked for HDFS-6532, I found that the error recovery will happens when 
> responder close fails. The related codes:
> {code}
> {code:title=DataStreamer.java|borderStyle=solid}
>   public void run() {
> long lastPacket = Time.monotonicNow();
> TraceScope scope = null;
> while (!streamerClosed && dfsClient.clientRunning) {
>   // if the Responder encountered an error, shutdown Responder
>   if (errorState.hasError() && response != null) {
> try {
>   response.close();
>   response.join();
>   response = null;
> } catch (InterruptedException e) {
>   // If interruptedException happens, the response will not be set to 
> null
>   // and that will lead the error recovry.
>   LOG.warn("Caught exception", e);
> }
>   }
>   // Here need add a finally block to set response as null
>   ...
> {code}
> Can see the related 
> comment:https://issues.apache.org/jira/browse/HDFS-6532?focusedCommentId=15448770=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15448770



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6532) Intermittent test failure org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt

2016-08-30 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451120#comment-15451120
 ] 

Xiao Chen commented on HDFS-6532:
-

Yep, sadly I'm not able to locally reproduce this at all, either with upstream 
or cdh.
The log I attached is from a CDH code base, where I could use [~andrew.wang]'s 
[dist_test|http://blog.cloudera.com/blog/2016/05/quality-assurance-at-cloudera-distributed-unit-testing/]
 to reproduce this. (So far dist_test doesn't work with upstream yet.)

Feel free to attach here if you're able to get a failure log. Thanks.

> Intermittent test failure 
> org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt
> --
>
> Key: HDFS-6532
> URL: https://issues.apache.org/jira/browse/HDFS-6532
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yiqun Lin
> Attachments: HDFS-6532.001.patch, 
> TEST-org.apache.hadoop.hdfs.TestCrcCorruption.xml
>
>
> Per https://builds.apache.org/job/Hadoop-Hdfs-trunk/1774/testReport, we had 
> the following failure. Local rerun is successful
> {code}
> Regression
> org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt
> Failing for the past 1 build (Since Failed#1774 )
> Took 50 sec.
> Error Message
> test timed out after 5 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 5 milliseconds
>   at java.lang.Object.wait(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2024)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2008)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2107)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:98)
>   at 
> org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt(TestCrcCorruption.java:133)
> {code}
> See relevant exceptions in log
> {code}
> 2014-06-14 11:56:15,283 WARN  datanode.DataNode 
> (BlockReceiver.java:verifyChunks(404)) - Checksum error in block 
> BP-1675558312-67.195.138.30-1402746971712:blk_1073741825_1001 from 
> /127.0.0.1:41708
> org.apache.hadoop.fs.ChecksumException: Checksum error: 
> DFSClient_NONMAPREDUCE_-1139495951_8 at 64512 exp: 1379611785 got: -12163112
>   at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:353)
>   at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:284)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:402)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:537)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:734)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:741)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:234)
>   at java.lang.Thread.run(Thread.java:662)
> 2014-06-14 11:56:15,285 WARN  datanode.DataNode 
> (BlockReceiver.java:run(1207)) - IOException in BlockReceiver.run(): 
> java.io.IOException: Shutting down writer and responder due to a checksum 
> error in received data. The error response has been sent upstream.
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1352)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1278)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1199)
>   at java.lang.Thread.run(Thread.java:662)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-30 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451074#comment-15451074
 ] 

Brahma Reddy Battula commented on HDFS-9696:


Oh, branch-2.6 patch is same with branch-2.7.

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-9696-branch-2.7.patch, HDFS-9696.branch-2.6.patch, 
> HDFS-9696.patch, HDFS-9696.v2.patch
>
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-30 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-9696:
---
Attachment: HDFS-9696-branch-2.7.patch

FYR..Uploading the  branch-2.7 patch which was committed..Since there was 
conflict.

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-9696-branch-2.7.patch, HDFS-9696.branch-2.6.patch, 
> HDFS-9696.patch, HDFS-9696.v2.patch
>
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10426) TestPendingInvalidateBlock failed in trunk

2016-08-30 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10426:
-
Attachment: HDFS-10426.004.patch

I make a improvement based on my last patch. I construct a new method for 
setting the existed variable {{blocksInvalidateWorkPct}}, so we can control the 
blocks deletion in BlockManager. And we can reuse that in the BlockManager test 
in the future. Hi, [~iwasakims], can do a quick look for this? Thanks.

> TestPendingInvalidateBlock failed in trunk
> --
>
> Key: HDFS-10426
> URL: https://issues.apache.org/jira/browse/HDFS-10426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10426.001.patch, HDFS-10426.002.patch, 
> HDFS-10426.003.patch, HDFS-10426.004.patch
>
>
> The test {{TestPendingInvalidateBlock}} failed sometimes. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
> testPendingDeletion(org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock)
>   Time elapsed: 7.703 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeletion(TestPendingInvalidateBlock.java:92)
> {code}
> It looks that the {{invalidateBlock}} has been removed before we do the check
> {code}
> // restart NN
> cluster.restartNameNode(true);
> dfs.delete(foo, true);
> Assert.assertEquals(0, cluster.getNamesystem().getBlocksTotal());
> Assert.assertEquals(REPLICATION, cluster.getNamesystem()
> .getPendingDeletionBlocks());
> Assert.assertEquals(REPLICATION,
> dfs.getPendingDeletionBlocksCount());
> {code}
> And I look into the related configurations. I found the property 
> {{dfs.namenode.replication.interval}} was just set as 1 second in this test. 
> And after the delay time of {{dfs.namenode.startup.delay.block.deletion.sec}} 
> and the delete operation was slowly, it will cause this case. We can see the 
> stack info before, the failed test costs 7.7s more than 5+1 second.
> One way can improve this.
> * Increase the time of {{dfs.namenode.replication.interval}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10713) Throttle FsNameSystem lock warnings

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450964#comment-15450964
 ] 

Hadoop QA commented on HDFS-10713:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 586 unchanged - 1 fixed = 588 total (was 587) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 58s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 80m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.server.namenode.TestFSNamesystem |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-10713 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12826297/HDFS-10713.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 1aab5711ce1f 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 20ae1fa |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16586/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16586/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16586/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16586/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message 

[jira] [Commented] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450902#comment-15450902
 ] 

Hadoop QA commented on HDFS-10817:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 6 new + 586 unchanged - 1 fixed = 592 total (was 587) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 57s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-10817 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12826296/HDFS-10817.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 00070f740520 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 20ae1fa |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16585/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16585/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16585/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16585/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically 

[jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450892#comment-15450892
 ] 

Hadoop QA commented on HDFS-4025:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 66 new + 486 unchanged - 0 fixed = 552 total (was 486) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
56s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 15s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 95m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Dead store to uri in 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.getAllJournalNodeAddrs()
  At 
JournalNodeSyncer.java:org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.getAllJournalNodeAddrs()
  At JournalNodeSyncer.java:[line 245] |
|  |  Redundant nullcheck of nsInfo, which is known to be non-null in 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.getMissingLogSegments(QJournalProtocolProtos$GetEditLogManifestResponseProto,
 String, String)  Redundant null check at JournalNodeSyncer.java:is known to be 
non-null in 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.getMissingLogSegments(QJournalProtocolProtos$GetEditLogManifestResponseProto,
 String, String)  Redundant null check at JournalNodeSyncer.java:[line 285] |
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-4025 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12826292/HDFS-4025.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux e0b1a0edb0ae 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d6d9cff |
| 

[jira] [Commented] (HDFS-10722) Fix race condition in TestEditLog#testBatchedSyncWithClosedLogs

2016-08-30 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450875#comment-15450875
 ] 

Brahma Reddy Battula commented on HDFS-10722:
-

As this is reverted from branch-2.8 ( async edit logging is not present in the 
branch-2.8), can we remove 2.8.0 from fix version ..? 

> Fix race condition in TestEditLog#testBatchedSyncWithClosedLogs
> ---
>
> Key: HDFS-10722
> URL: https://issues.apache.org/jira/browse/HDFS-10722
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-10722.patch
>
>
> The test may fail the following assertion if async edit logs are enabled:
> {{logging edit without syncing should do not affect txid expected:<1> but 
> was:<2>}}.  The async thread is doing batched syncs in the background.  
> logSync just ensures the edit is durable, so the txid may increase prior to 
> sync.  It's a race.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10780) Block replication not proceeding after pipeline recovery -- TestDataNodeHotSwapVolumes fails

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450870#comment-15450870
 ] 

Hadoop QA commented on HDFS-10780:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 62m  
1s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-10780 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12826293/HDFS-10780.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 6c1deef2b562 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d6d9cff |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16584/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16584/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Block replication not proceeding after pipeline recovery -- 
> TestDataNodeHotSwapVolumes fails
> 
>
> Key: HDFS-10780
> URL: https://issues.apache.org/jira/browse/HDFS-10780
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-10780.001.patch
>
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test 
> 

[jira] [Commented] (HDFS-10820) DataStreamer#Responder close fails lead the error recovery

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450844#comment-15450844
 ] 

Hadoop QA commented on HDFS-10820:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-client: The 
patch generated 1 new + 77 unchanged - 1 fixed = 78 total (was 78) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 29s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-10820 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12826298/HDFS-10820.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 26dddc6a4505 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 20ae1fa |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16587/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16587/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16587/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DataStreamer#Responder close fails lead the error recovery
> --
>
> Key: HDFS-10820
> URL: https://issues.apache.org/jira/browse/HDFS-10820
>  

[jira] [Commented] (HDFS-6532) Intermittent test failure org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt

2016-08-30 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450833#comment-15450833
 ] 

Yiqun Lin commented on HDFS-6532:
-

Thanks [~xiaochen] for the comment.
{quote}
It does look like we can reuse the closeResponder method in the loop
{quote}
Agreed. I have filed a new JIRA HDFS-10820 for tracking that. I think we are 
closing. I'd like to found more clues from failure logs, but it seems the 
failure log that attached was based on the old codes? Right?

> Intermittent test failure 
> org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt
> --
>
> Key: HDFS-6532
> URL: https://issues.apache.org/jira/browse/HDFS-6532
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yiqun Lin
> Attachments: HDFS-6532.001.patch, 
> TEST-org.apache.hadoop.hdfs.TestCrcCorruption.xml
>
>
> Per https://builds.apache.org/job/Hadoop-Hdfs-trunk/1774/testReport, we had 
> the following failure. Local rerun is successful
> {code}
> Regression
> org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt
> Failing for the past 1 build (Since Failed#1774 )
> Took 50 sec.
> Error Message
> test timed out after 5 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 5 milliseconds
>   at java.lang.Object.wait(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2024)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2008)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2107)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:98)
>   at 
> org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt(TestCrcCorruption.java:133)
> {code}
> See relevant exceptions in log
> {code}
> 2014-06-14 11:56:15,283 WARN  datanode.DataNode 
> (BlockReceiver.java:verifyChunks(404)) - Checksum error in block 
> BP-1675558312-67.195.138.30-1402746971712:blk_1073741825_1001 from 
> /127.0.0.1:41708
> org.apache.hadoop.fs.ChecksumException: Checksum error: 
> DFSClient_NONMAPREDUCE_-1139495951_8 at 64512 exp: 1379611785 got: -12163112
>   at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:353)
>   at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:284)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:402)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:537)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:734)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:741)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:234)
>   at java.lang.Thread.run(Thread.java:662)
> 2014-06-14 11:56:15,285 WARN  datanode.DataNode 
> (BlockReceiver.java:run(1207)) - IOException in BlockReceiver.run(): 
> java.io.IOException: Shutting down writer and responder due to a checksum 
> error in received data. The error response has been sent upstream.
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1352)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1278)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1199)
>   at java.lang.Thread.run(Thread.java:662)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10820) DataStreamer#Responder close fails lead the error recovery

2016-08-30 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450798#comment-15450798
 ] 

Yiqun Lin edited comment on HDFS-10820 at 8/31/16 2:13 AM:
---

Attach a simple patch for fixing this. According to the comment from 
[~xiaochen] , I reuse the method {{closeResponder}} here.


was (Author: linyiqun):
Attach a simple patch for fixing this. According the comment from [~xiaochen] , 
I reuse the method {{closeResponder}} here.

> DataStreamer#Responder close fails lead the error recovery
> --
>
> Key: HDFS-10820
> URL: https://issues.apache.org/jira/browse/HDFS-10820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10820.001.patch
>
>
> When worked for HDFS-6532, I found that the error recovery will happens when 
> responder close fails. The related codes:
> {code}
> {code:title=DataStreamer.java|borderStyle=solid}
>   public void run() {
> long lastPacket = Time.monotonicNow();
> TraceScope scope = null;
> while (!streamerClosed && dfsClient.clientRunning) {
>   // if the Responder encountered an error, shutdown Responder
>   if (errorState.hasError() && response != null) {
> try {
>   response.close();
>   response.join();
>   response = null;
> } catch (InterruptedException e) {
>   // If interruptedException happens, the response will not be set to 
> null
>   // and that will lead the error recovry.
>   LOG.warn("Caught exception", e);
> }
>   }
>   // Here need add a finally block to set response as null
>   ...
> {code}
> Can see the related 
> comment:https://issues.apache.org/jira/browse/HDFS-6532?focusedCommentId=15448770=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15448770



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10820) DataStreamer#Responder close fails lead the error recovery

2016-08-30 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10820:
-
Attachment: HDFS-10820.001.patch

> DataStreamer#Responder close fails lead the error recovery
> --
>
> Key: HDFS-10820
> URL: https://issues.apache.org/jira/browse/HDFS-10820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10820.001.patch
>
>
> When worked for HDFS-6532, I found that the error recovery will happens when 
> responder close fails. The related codes:
> {code}
> {code:title=DataStreamer.java|borderStyle=solid}
>   public void run() {
> long lastPacket = Time.monotonicNow();
> TraceScope scope = null;
> while (!streamerClosed && dfsClient.clientRunning) {
>   // if the Responder encountered an error, shutdown Responder
>   if (errorState.hasError() && response != null) {
> try {
>   response.close();
>   response.join();
>   response = null;
> } catch (InterruptedException e) {
>   // If interruptedException happens, the response will not be set to 
> null
>   // and that will lead the error recovry.
>   LOG.warn("Caught exception", e);
> }
>   }
>   // Here need add a finally block to set response as null
>   ...
> {code}
> Can see the related 
> comment:https://issues.apache.org/jira/browse/HDFS-6532?focusedCommentId=15448770=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15448770



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10820) DataStreamer#Responder close fails lead the error recovery

2016-08-30 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10820:
-
Status: Patch Available  (was: Open)

Attach a simple patch for fixing this. According the comment from [~xiaochen] , 
I reuse the method {{closeResponder}} here.

> DataStreamer#Responder close fails lead the error recovery
> --
>
> Key: HDFS-10820
> URL: https://issues.apache.org/jira/browse/HDFS-10820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>
> When worked for HDFS-6532, I found that the error recovery will happens when 
> responder close fails. The related codes:
> {code}
> {code:title=DataStreamer.java|borderStyle=solid}
>   public void run() {
> long lastPacket = Time.monotonicNow();
> TraceScope scope = null;
> while (!streamerClosed && dfsClient.clientRunning) {
>   // if the Responder encountered an error, shutdown Responder
>   if (errorState.hasError() && response != null) {
> try {
>   response.close();
>   response.join();
>   response = null;
> } catch (InterruptedException e) {
>   // If interruptedException happens, the response will not be set to 
> null
>   // and that will lead the error recovry.
>   LOG.warn("Caught exception", e);
> }
>   }
>   // Here need add a finally block to set response as null
>   ...
> {code}
> Can see the related 
> comment:https://issues.apache.org/jira/browse/HDFS-6532?focusedCommentId=15448770=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15448770



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10713) Throttle FsNameSystem lock warnings

2016-08-30 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-10713:
--
Attachment: HDFS-10713.001.patch

Thank you [~arpitagarwal] for reviewing the patch.
I have fixed the errors in the new patch (v001).

> Throttle FsNameSystem lock warnings
> ---
>
> Key: HDFS-10713
> URL: https://issues.apache.org/jira/browse/HDFS-10713
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging, namenode
>Reporter: Arpit Agarwal
>Assignee: Hanisha Koneru
> Attachments: HDFS-10713.000.patch, HDFS-10713.001.patch
>
>
> The NameNode logs a message if the FSNamesystem write lock is held by a 
> thread for over 1 second. These messages can be throttled to at one most one 
> per x minutes to avoid potentially filling up NN logs. We can also log the 
> number of suppressed notices since the last log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10820) DataStreamer#Responder close fails lead the error recovery

2016-08-30 Thread Yiqun Lin (JIRA)
Yiqun Lin created HDFS-10820:


 Summary: DataStreamer#Responder close fails lead the error recovery
 Key: HDFS-10820
 URL: https://issues.apache.org/jira/browse/HDFS-10820
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yiqun Lin
Assignee: Yiqun Lin


When worked for HDFS-6532, I found that the error recovery will happens when 
responder close fails. The related codes:

{code}
{code:title=DataStreamer.java|borderStyle=solid}
  public void run() {
long lastPacket = Time.monotonicNow();
TraceScope scope = null;
while (!streamerClosed && dfsClient.clientRunning) {
  // if the Responder encountered an error, shutdown Responder
  if (errorState.hasError() && response != null) {
try {
  response.close();
  response.join();
  response = null;
} catch (InterruptedException e) {
  // If interruptedException happens, the response will not be set to 
null
  // and that will lead the error recovry.
  LOG.warn("Caught exception", e);
}
  }
  // Here need add a finally block to set response as null
  ...
{code}
Can see the related 
comment:https://issues.apache.org/jira/browse/HDFS-6532?focusedCommentId=15448770=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15448770





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10780) Block replication not proceeding after pipeline recovery -- TestDataNodeHotSwapVolumes fails

2016-08-30 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450779#comment-15450779
 ] 

Manoj Govindassamy commented on HDFS-10780:
---

Filed HDFS-10819 to track the Problem 2.

> Block replication not proceeding after pipeline recovery -- 
> TestDataNodeHotSwapVolumes fails
> 
>
> Key: HDFS-10780
> URL: https://issues.apache.org/jira/browse/HDFS-10780
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-10780.001.patch
>
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test 
> testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
> as there could be timeouts, data node not reachable etc, and in this test 
> case it was more of induced one as one of the volumes in a datanode is 
> removed while block write is in progress. Digging further in the logs, when 
> the problem happens in the write pipeline, the error recovery is not 
> happening as expected leading to block replication never catching up.
> Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.serv
> testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
>   Time elapsed: 44.354 se
> java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 
> replicas
> Results :
> Tests in error: 
>   
> TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714
>  » Timeout
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
> Following exceptions are not expected in this test run
> {noformat}
>  614 2016-08-10 12:30:11,269 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_-640082112_10 at /127.0.0.1:58805 [Receiving block 
> BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]] DEBUG 
> datanode.Da taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number 
> of active connections is: 2
>  615 java.lang.IllegalMonitorStateException
>  616 at java.lang.Object.wait(Native Method)
>  617 at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
>  618 at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
>  619 at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
>  620 at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
> {noformat}
> {noformat}
>  720 2016-08-10 12:30:11,287 [DataNode: 
> [[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
>  [DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec 
> t/hadoop-hdfs/target/test/data/dfs/data/data2/]]  heartbeating to 
> localhost/127.0.0.1:58788] ERROR datanode.DataNode 
> (BPServiceActor.java:run(768)) - Exception in BPOfferService for Block pool 
> BP-18529 88604-172.16.3.66-1470857409044 (Datanode Uuid 
> 711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
>  721 java.lang.NullPointerException
>  722 at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
>  723 at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
>  724 at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
>  725 at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
>  726 at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10742) Measurement of lock held time in FsDatasetImpl

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450777#comment-15450777
 ] 

Hadoop QA commented on HDFS-10742:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 9 new + 109 unchanged - 0 fixed = 118 total (was 109) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 58s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 82m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-10742 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12826286/HDFS-10742.009.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 741e0e8df98c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d6d9cff |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16582/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16582/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16582/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16582/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Measurement of lock held time in FsDatasetImpl
> 

[jira] [Commented] (HDFS-10813) DiskBalancer: Add the getNodeList method in Command

2016-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450774#comment-15450774
 ] 

Hudson commented on HDFS-10813:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10378 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10378/])
HDFS-10813. DiskBalancer: Add the getNodeList method in Command. (aengineer: 
rev 20ae1fa259b36a7bc11b0f8de1ebf753c858f93c)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/command/Command.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/diskbalancer/command/TestDiskBalancerCommand.java


> DiskBalancer: Add the getNodeList method in Command
> ---
>
> Key: HDFS-10813
> URL: https://issues.apache.org/jira/browse/HDFS-10813
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HDFS-10813.001.patch
>
>
> The method {{Command#getNodeList}} in DiskBalancer was added in HDFS-9545, 
> but it's never used. We can improve that in the following aspects:
> 1.Change {{private}} to {{protected}} so that the subclass can use that 
> method in the future.
> 2.Reuse the method {{Command#getNodeList}} and to construct a new method
> like this {{List getNodes(String listArg)}}. This 
> method can be used for getting multiple nodes in the future. For example, if 
> we want to use {{hdfs diskbalancer -report -node}} or {{hdfs diskbalancer 
> -plan}} with multiple specified nodes, that method can be used. Now these 
> commands only support one node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10819) BlockManager fails to store a good block for a datanode storage after it reported a corrupt block — block replication stuck

2016-08-30 Thread Manoj Govindassamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HDFS-10819:
--
Description: 
TestDataNodeHotSwapVolumes occasionally fails in the unit test 
testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
as there could be timeouts, data node not reachable etc, and in this test case 
it was more of induced one as one of the volumes in a datanode is removed while 
block write is in progress. Digging further in the logs, when the problem 
happens in the write pipeline, the error recovery is not happening as expected 
leading to block replication never catching up.

Though this problem has same signature as in HDFS-10780, from the logs it looks 
like the code paths taken are totally different and so the root cause could be 
different as well.


  was:
TestDataNodeHotSwapVolumes occasionally fails in the unit test 
testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
as there could be timeouts, data node not reachable etc, and in this test case 
it was more of induced one as one of the volumes in a datanode is removed while 
block write is in progress. Digging further in the logs, when the problem 
happens in the write pipeline, the error recovery is not happening as expected 
leading to block replication never catching up.

Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.serv
testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
  Time elapsed: 44.354 se
java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 
replicas
Results :
Tests in error: 
  
TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714
 » Timeout
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0

Following exceptions are not expected in this test run
{noformat}
 614 2016-08-10 12:30:11,269 [DataXceiver for client 
DFSClient_NONMAPREDUCE_-640082112_10 at /127.0.0.1:58805 [Receiving block 
BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]] DEBUG datanode.Da 
taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number of active 
connections is: 2
 615 java.lang.IllegalMonitorStateException
 616 at java.lang.Object.wait(Native Method)
 617 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
 618 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
 619 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
 620 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
{noformat}

{noformat}
 720 2016-08-10 12:30:11,287 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
 [DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec 
t/hadoop-hdfs/target/test/data/dfs/data/data2/]]  heartbeating to 
localhost/127.0.0.1:58788] ERROR datanode.DataNode 
(BPServiceActor.java:run(768)) - Exception in BPOfferService for Block pool 
BP-18529 88604-172.16.3.66-1470857409044 (Datanode Uuid 
711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
 721 java.lang.NullPointerException
 722 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
 723 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
 724 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
 725 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
 726 at java.lang.Thread.run(Thread.java:745)
{noformat}




> BlockManager fails to store a good block for a datanode storage after it 
> reported a corrupt block — block replication stuck
> ---
>
> Key: HDFS-10819
> URL: https://issues.apache.org/jira/browse/HDFS-10819
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test 
> testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
> as there could be timeouts, data node not reachable etc, and in this test 
> case it was more of induced one as one of the volumes in a datanode is 
> removed while block write is in progress. Digging further in the 

[jira] [Created] (HDFS-10819) BlockManager fails to store a good block for a datanode storage after it reported a corrupt block — block replication stuck

2016-08-30 Thread Manoj Govindassamy (JIRA)
Manoj Govindassamy created HDFS-10819:
-

 Summary: BlockManager fails to store a good block for a datanode 
storage after it reported a corrupt block — block replication stuck
 Key: HDFS-10819
 URL: https://issues.apache.org/jira/browse/HDFS-10819
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.0.0-alpha1
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy


TestDataNodeHotSwapVolumes occasionally fails in the unit test 
testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
as there could be timeouts, data node not reachable etc, and in this test case 
it was more of induced one as one of the volumes in a datanode is removed while 
block write is in progress. Digging further in the logs, when the problem 
happens in the write pipeline, the error recovery is not happening as expected 
leading to block replication never catching up.

Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.serv
testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
  Time elapsed: 44.354 se
java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 
replicas
Results :
Tests in error: 
  
TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714
 » Timeout
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0

Following exceptions are not expected in this test run
{noformat}
 614 2016-08-10 12:30:11,269 [DataXceiver for client 
DFSClient_NONMAPREDUCE_-640082112_10 at /127.0.0.1:58805 [Receiving block 
BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]] DEBUG datanode.Da 
taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number of active 
connections is: 2
 615 java.lang.IllegalMonitorStateException
 616 at java.lang.Object.wait(Native Method)
 617 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
 618 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
 619 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
 620 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
{noformat}

{noformat}
 720 2016-08-10 12:30:11,287 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
 [DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec 
t/hadoop-hdfs/target/test/data/dfs/data/data2/]]  heartbeating to 
localhost/127.0.0.1:58788] ERROR datanode.DataNode 
(BPServiceActor.java:run(768)) - Exception in BPOfferService for Block pool 
BP-18529 88604-172.16.3.66-1470857409044 (Datanode Uuid 
711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
 721 java.lang.NullPointerException
 722 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
 723 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
 724 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
 725 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
 726 at java.lang.Thread.run(Thread.java:745)
{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10813) DiskBalancer: Add the getNodeList method in Command

2016-08-30 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-10813:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

[~linyiqun] Thank you very much for the contribution. I have committed this to 
trunk.

> DiskBalancer: Add the getNodeList method in Command
> ---
>
> Key: HDFS-10813
> URL: https://issues.apache.org/jira/browse/HDFS-10813
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HDFS-10813.001.patch
>
>
> The method {{Command#getNodeList}} in DiskBalancer was added in HDFS-9545, 
> but it's never used. We can improve that in the following aspects:
> 1.Change {{private}} to {{protected}} so that the subclass can use that 
> method in the future.
> 2.Reuse the method {{Command#getNodeList}} and to construct a new method
> like this {{List getNodes(String listArg)}}. This 
> method can be used for getting multiple nodes in the future. For example, if 
> we want to use {{hdfs diskbalancer -report -node}} or {{hdfs diskbalancer 
> -plan}} with multiple specified nodes, that method can be used. Now these 
> commands only support one node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450738#comment-15450738
 ] 

Erik Krogen commented on HDFS-10817:


Thanks for the comments [~zhz]! Uploading a new patch addressing some comments 
as described below.

1. Thanks for pointing that out. Updated. 
2. Done. I originally had it that way because I didn't want to grab the 
ThreadLocal while inside the lock but after further considering the performance 
characteristics of this I do not think it is an issue.
3. It is my understanding that it is good practice to clean up ThreadLocal 
values when you are finished with them, especially if they are set within a 
thread pool (which, IIUC, they are here). See 
http://stackoverflow.com/a/818120/5594176. However I do not have much 
experience with ThreadLocals and will defer to your call.
4. I have made those a little more robust.
5. This actually isn't possible under the current {{FSNamesystemLock}} design. 
Within {{FSNamesystemLock}}, the methods to actually lock are not exposed. It 
simply has methods {{writeLock}} and {{readLock}} which return the respective 
lock, then the calling class uses the {{lock}}/{{lockIinterruptibly}} methods 
on the returned objects. 
6. This is a good idea. I have added more variety to the threads which are 
tested in {{testFSReadLockLongHoldingReport}}. 

> Add Logging for Long-held NN Read Locks
> ---
>
> Key: HDFS-10817
> URL: https://issues.apache.org/jira/browse/HDFS-10817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Attachments: HDFS-10817.000.patch, HDFS-10817.001.patch, 
> HDFS-10817.002.patch
>
>
> Right now the Namenode will log when a write lock is held for a long time to 
> help tracks methods which are causing expensive delays. Let's do the same for 
> read locks since these operations may also be expensive/long and cause 
> delays. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-10817:
---
Attachment: HDFS-10817.002.patch

> Add Logging for Long-held NN Read Locks
> ---
>
> Key: HDFS-10817
> URL: https://issues.apache.org/jira/browse/HDFS-10817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Attachments: HDFS-10817.000.patch, HDFS-10817.001.patch, 
> HDFS-10817.002.patch
>
>
> Right now the Namenode will log when a write lock is held for a long time to 
> help tracks methods which are causing expensive delays. Let's do the same for 
> read locks since these operations may also be expensive/long and cause 
> delays. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10813) DiskBalancer: Add the getNodeList method in Command

2016-08-30 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450716#comment-15450716
 ] 

Anu Engineer commented on HDFS-10813:
-

[~linyiqun] You are right, I was anticipating the change based on the comments, 
and how the command CLI would change based on using this function. Sorry about 
the confusion.  I will commit this now and hopefully when the CLI changes you 
can pack the documentation changes along. 

> DiskBalancer: Add the getNodeList method in Command
> ---
>
> Key: HDFS-10813
> URL: https://issues.apache.org/jira/browse/HDFS-10813
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HDFS-10813.001.patch
>
>
> The method {{Command#getNodeList}} in DiskBalancer was added in HDFS-9545, 
> but it's never used. We can improve that in the following aspects:
> 1.Change {{private}} to {{protected}} so that the subclass can use that 
> method in the future.
> 2.Reuse the method {{Command#getNodeList}} and to construct a new method
> like this {{List getNodes(String listArg)}}. This 
> method can be used for getting multiple nodes in the future. For example, if 
> we want to use {{hdfs diskbalancer -report -node}} or {{hdfs diskbalancer 
> -plan}} with multiple specified nodes, that method can be used. Now these 
> commands only support one node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10655) Fix path related byte array conversion bugs

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450701#comment-15450701
 ] 

Hadoop QA commented on HDFS-10655:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
 4s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
59s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
41s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1650 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m 
43s{color} | {color:red} The patch 77 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 36s{color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_111. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
19s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}148m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_101 Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
| JDK v1.7.0_111 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
|   | hadoop.hdfs.server.datanode.TestBlockReplacement |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | 

[jira] [Commented] (HDFS-10655) Fix path related byte array conversion bugs

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450689#comment-15450689
 ] 

Hadoop QA commented on HDFS-10655:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 
36s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
55s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
15s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1650 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m 
43s{color} | {color:red} The patch 77 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m 25s{color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_111. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
18s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_101 Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
| JDK v1.7.0_111 Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.balancer.TestBalancer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:c420dfe |
| JIRA Issue | HDFS-10655 |
| JIRA Patch URL | 

[jira] [Updated] (HDFS-10780) Block replication not proceeding after pipeline recovery -- TestDataNodeHotSwapVolumes fails

2016-08-30 Thread Manoj Govindassamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HDFS-10780:
--
Attachment: HDFS-10780.001.patch

More details on the Issue 1:

*Problem:*
— After pipeline recovery (from data streaming failures), block replication to 
stale replicas are not happening
— TestDataNodeHotSwapVolumes fails with “TimeoutException: Timed out waiting 
for /test to reach 3 replicas” signature

*Analysis:*
— Assume write pipeline DN1 —> DN2 —> DN3
— For the {{UNDER_CONSTRUCTION}} Block, NameNode sets the *expected replicas* 
as DN1, DN2, DN3
— DN1 encounters a write issue (here the volume is removed while write is 
in-progress)
— Client detects pipeline issue, triggers pipeline recovery and gets the new 
write pipeline as DN2 —> DN3

— On a successful {{FSNameSystem::updatePipeline}} request from Client, 
NameNode bumps up the Generation Stamp (from 001 to 002) of the 
UnderConstruction (that is, the last) block of the file.
— All the current *expected replicas* are stale as they have lesser Generation 
Stamp compared to the new one after the pipeline update.
— NameNode resets *expected replicas* with the new set of storage ids from the 
update pipeline, which is {DN2, DN3}

— DNs send their Incremental Block Reports to NameNode. IBRs can have Blocks 
with old or new Generation Stamp. And these replica blocks can be in any states 
— FINALIZED, RBW, RBR, etc.,
— Assume, the stale replica DN1 sending IBR with the following
— — Replica Block State: RBW
— — Replica Block GS: 001 (stale)
— Assume, the good replica DN2, DN3 sending IBR with the following
— — Replica Block State: FINALIZED
— — Replica Block GS: 002 (good)


— {{BlockManager::processAndHandleReportedBlock}} when processing Incremental 
Block Reports, for Replica blocks in RBW/RBR states, NameNode does not check 
block Generation Stamps until the stored block is COMPLETE. Since the Block 
state at NN is still in UNDER_CONSTRUCTION, the *Stale RBW block from DN1 gets 
accepted*

— {{BlockManager::addStoredBlockUnderConstruction}} assumes the replica block 
from corrupt DN1 to be a good one and adds DN1’s StorageInfo to the expected 
replica locations. Refer: 
{{BlockUnderConstructionFeature::addReplicaIfNotPresent}}. Thus *expected 
replicas* again become (DN1, DN2, DN3).

— Later when the Client closes the file, {{FSNameSystem}} moves all the 
*expected replicas* to pendingReconstrcution. Refer: 
{{FSNameSystem::addComittedBlocksToPending}}

— {{BlockManager::checkRedundancy}} mistakenly believes pendingReconstruction 
count 1 (for DN1) is currently in-porgress and adding this to live replicas 
count 2 (for DN2, DN3), it decides no more reconstruction needed as it matches 
the configured replication factor of 3.

— Since there wasn’t any block reconstruction triggered for DN1, test times out 
waiting for the replication factor of 3. 


*Fix:*

— I believe the core issue here is in the processing of IBRs from stale 
replicas. Either 
— — (A) {{BlockManager::checkReplicaCorrupt}} has to tag the block as corrupt, 
when the replica state is RBW and when the block is not complete  OR
— — (B) {{BlockManager::addStoredBlockUnderConstruction}} should not ADD the 
corrupt replica in the *expected replicas* for the under construction block

Attached patch has the fix (B). Also, wrote a unit test to explicitly check for 
expected replica count under above line of events. 

[~eddyxu], [~andrew.wang], [~yzhangal] can you please take a look at the patch ?

> Block replication not proceeding after pipeline recovery -- 
> TestDataNodeHotSwapVolumes fails
> 
>
> Key: HDFS-10780
> URL: https://issues.apache.org/jira/browse/HDFS-10780
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-10780.001.patch
>
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test 
> testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
> as there could be timeouts, data node not reachable etc, and in this test 
> case it was more of induced one as one of the volumes in a datanode is 
> removed while block write is in progress. Digging further in the logs, when 
> the problem happens in the write pipeline, the error recovery is not 
> happening as expected leading to block replication never catching up.
> Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.serv
> testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
>   Time elapsed: 44.354 se
> 

[jira] [Updated] (HDFS-10780) Block replication not proceeding after pipeline recovery -- TestDataNodeHotSwapVolumes fails

2016-08-30 Thread Manoj Govindassamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HDFS-10780:
--
Status: Patch Available  (was: Open)

> Block replication not proceeding after pipeline recovery -- 
> TestDataNodeHotSwapVolumes fails
> 
>
> Key: HDFS-10780
> URL: https://issues.apache.org/jira/browse/HDFS-10780
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-10780.001.patch
>
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test 
> testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
> as there could be timeouts, data node not reachable etc, and in this test 
> case it was more of induced one as one of the volumes in a datanode is 
> removed while block write is in progress. Digging further in the logs, when 
> the problem happens in the write pipeline, the error recovery is not 
> happening as expected leading to block replication never catching up.
> Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.serv
> testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
>   Time elapsed: 44.354 se
> java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 
> replicas
> Results :
> Tests in error: 
>   
> TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714
>  » Timeout
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
> Following exceptions are not expected in this test run
> {noformat}
>  614 2016-08-10 12:30:11,269 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_-640082112_10 at /127.0.0.1:58805 [Receiving block 
> BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]] DEBUG 
> datanode.Da taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number 
> of active connections is: 2
>  615 java.lang.IllegalMonitorStateException
>  616 at java.lang.Object.wait(Native Method)
>  617 at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
>  618 at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
>  619 at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
>  620 at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
> {noformat}
> {noformat}
>  720 2016-08-10 12:30:11,287 [DataNode: 
> [[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
>  [DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec 
> t/hadoop-hdfs/target/test/data/dfs/data/data2/]]  heartbeating to 
> localhost/127.0.0.1:58788] ERROR datanode.DataNode 
> (BPServiceActor.java:run(768)) - Exception in BPOfferService for Block pool 
> BP-18529 88604-172.16.3.66-1470857409044 (Datanode Uuid 
> 711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
>  721 java.lang.NullPointerException
>  722 at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
>  723 at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
>  724 at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
>  725 at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
>  726 at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

2016-08-30 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-4025:
-
Attachment: HDFS-4025.001.patch

Thank you [~jingzhao] for reviewing the patch and suggesting improvements. 
I have moved al the sync logic to a new class and addressed the other comments 
as well.

> QJM: Sychronize past log segments to JNs that missed them
> -
>
> Key: HDFS-4025
> URL: https://issues.apache.org/jira/browse/HDFS-4025
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Hanisha Koneru
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and 
> then comes back, it will be re-added as a valid part of the quorum on the 
> next log roll. However, it will not have a complete history of log segments 
> (i.e any individual JN may have gaps in its transaction history). This 
> mirrors the behavior of the NameNode when there are multiple local 
> directories specified.
> However, it would be better if a background thread noticed these gaps and 
> "filled them in" by grabbing the segments from other JournalNodes. This 
> increases the resilience of the system when JournalNodes get reformatted or 
> otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10808) DiskBalancer does not execute multi-steps plan-redux

2016-08-30 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450677#comment-15450677
 ] 

Anu Engineer commented on HDFS-10808:
-

[~eddyxu] Thank you very much for taking time out to look at the code and ask 
me these pertinent questions.

bq. Wouldn't break is the simplest way to exit the while loop? 
if you had one or two breaks may be. But as the number of breaks increase, the 
control flow graph becomes quite complex. So it becomes harder to reason about 
the resources and other states from each exit point.  An easy way of thinking 
about copyBlocks would be to think of it as a simple state machine. if you were 
implementing a state machine, you would probably move to state "exit" and use 
the default mechanisms of the state machine to handle the exit state instead of 
breaking out at each error condition. copyBlocks is following that pattern. So 
while in generic case I agree with you, in this specific case I think this 
pattern produces code that is easier to reason.

bq.  on the comment.
{noformat}
// check if someone told us exit, treat this as an interruption
// point for the thread, since both getNextBlock and moveBlocAcrossVolume
// can take some time.
{noformat}
I was under the impression that comment is quite clear, may be I am mistaken.

There are several conditions under which we would like to exit in the copy 
blocks thread. Some of them are states. Some are actions with clear side 
effects. What we are trying to do is minimize the effects of both. So we 
introduce the notion of "interruption points" in our copy thread. That is when 
we invoke a function and if we encounter a failure condition, we flag that 
information so that at the next safe point to bail, we will. In other words, we 
don't exit at the point of error, but simply set the state so that thread can 
proceed to a point where it considers that it is safe for it to exit.

Examples of action with side effects are, copy of data block but metadata is 
not still copied or getting a bunch of disk errors (we wait till 5) before we 
can get out etc, or finding a block and before we can get to it, it disappears 
underneath us. Since we have all these kinds of external conditions to take 
care of, we simply set up a flag telling the system to exit cleanly. This 
paradigm gives us a centralized exit handler, so if the thread had to do some 
specific cleanup based on certain error, it is still possible to chain those 
error handlers at the exit point.

Yes, the Atomic nature of shouldRun flag is confusing and perhaps not needed. 
It is an artifact of playing around with copying multiple blocks when I was 
developing code. It had a different structure, but then I found that enforcing 
bandwidth was harder and decided to do single block copy at a time.

I really appreciate you taking time out to ask these questions and helping to 
make sure that I am in the right path.


 



> DiskBalancer does not execute multi-steps plan-redux
> 
>
> Key: HDFS-10808
> URL: https://issues.apache.org/jira/browse/HDFS-10808
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-10808.001.patch, HDFS-10808.002.patch
>
>
> This is to redo of the fix in HDFS-10598



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10813) DiskBalancer: Add the getNodeList method in Command

2016-08-30 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450640#comment-15450640
 ] 

Yiqun Lin edited comment on HDFS-10813 at 8/31/16 12:59 AM:


Thanks [~anu] for the review. 
{quote}
could you please file a document JIRA that explains how this feature works. 
Since the change is the CLI it is good to have the documentation updated too.
{quote}
I am some confused here, I just add one utility method in {{Command}} and I 
haven't changed the current CLI. Right? In addition, I will file some JIRAS for 
supporting multiple specified nodes for {{hdfs diskbalancer -report -node}} or 
other commands after this patch being committed.

Correct me if I am wrong.
Thanks.


was (Author: linyiqun):
Thanks [~anu] for the review. 
{quote}
could you please file a document JIRA that explains how this feature works. 
Since the change is the CLI it is good to have the documentation updated too.
{quote}
I am some confused here, I just add one utility method in {{Command}} and I 
haven't changed the current CLI. Right? In addition, I will file some JIRAS for 
supporting multiple specified nodes for {{hdfs diskbalancer -report -node}} or 
other commands.

Correct me if I am wrong.
Thanks.

> DiskBalancer: Add the getNodeList method in Command
> ---
>
> Key: HDFS-10813
> URL: https://issues.apache.org/jira/browse/HDFS-10813
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HDFS-10813.001.patch
>
>
> The method {{Command#getNodeList}} in DiskBalancer was added in HDFS-9545, 
> but it's never used. We can improve that in the following aspects:
> 1.Change {{private}} to {{protected}} so that the subclass can use that 
> method in the future.
> 2.Reuse the method {{Command#getNodeList}} and to construct a new method
> like this {{List getNodes(String listArg)}}. This 
> method can be used for getting multiple nodes in the future. For example, if 
> we want to use {{hdfs diskbalancer -report -node}} or {{hdfs diskbalancer 
> -plan}} with multiple specified nodes, that method can be used. Now these 
> commands only support one node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10813) DiskBalancer: Add the getNodeList method in Command

2016-08-30 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450640#comment-15450640
 ] 

Yiqun Lin commented on HDFS-10813:
--

Thanks [~anu] for the review. 
{quote}
could you please file a document JIRA that explains how this feature works. 
Since the change is the CLI it is good to have the documentation updated too.
{quote}
I am some confused here, I just add one utility method in {{Command}} and I 
haven't changed the current CLI. Right? In addition, I will file some JIRAS for 
supporting multiple specified nodes for {{hdfs diskbalancer -report -node}} or 
other commands.

Correct me if I am wrong.
Thanks.

> DiskBalancer: Add the getNodeList method in Command
> ---
>
> Key: HDFS-10813
> URL: https://issues.apache.org/jira/browse/HDFS-10813
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HDFS-10813.001.patch
>
>
> The method {{Command#getNodeList}} in DiskBalancer was added in HDFS-9545, 
> but it's never used. We can improve that in the following aspects:
> 1.Change {{private}} to {{protected}} so that the subclass can use that 
> method in the future.
> 2.Reuse the method {{Command#getNodeList}} and to construct a new method
> like this {{List getNodes(String listArg)}}. This 
> method can be used for getting multiple nodes in the future. For example, if 
> we want to use {{hdfs diskbalancer -report -node}} or {{hdfs diskbalancer 
> -plan}} with multiple specified nodes, that method can be used. Now these 
> commands only support one node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450609#comment-15450609
 ] 

Hadoop QA commented on HDFS-10817:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 12 new + 586 unchanged - 1 fixed = 598 total (was 587) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 57m 
57s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m 55s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-10817 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12826271/HDFS-10817.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 7156bedcdb0c 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d6d9cff |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16581/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16581/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16581/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add Logging for Long-held NN Read Locks
> ---
>
> Key: HDFS-10817
> URL: https://issues.apache.org/jira/browse/HDFS-10817
> Project: Hadoop HDFS
>  Issue 

[jira] [Commented] (HDFS-10808) DiskBalancer does not execute multi-steps plan-redux

2016-08-30 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450594#comment-15450594
 ] 

Lei (Eddy) Xu commented on HDFS-10808:
--

Thanks for the explanation, [~anu]

bq.  The intend of having the setExitFlag was to reduce the complexity of 
multiple exits from the loop. 

Wouldn't {{break}} is the simplest way to exit the while loop? 

bq. Fortunately for us, the way we do cancel is by cancelling the executor and 
not by relying on this flag. 

It makes me wonder why we have the flag in the first place. And for the 
comments of the following code:

{code}
  // check if someone told us exit, treat this as an interruption
// point
// for the thread, since both getNextBlock and moveBlocAcrossVolume
// can take some time.
if (!shouldRun()) {
  break;
}
{code}

If {{shouldRun}} is not used for canceling, it is very confused to me because 
all the {{exitFlag}} and {{shouldRun()}} are only consumed within 
{{copyBlocks()}}, and use atomic boolean for {{shouldRun}} flag.

> DiskBalancer does not execute multi-steps plan-redux
> 
>
> Key: HDFS-10808
> URL: https://issues.apache.org/jira/browse/HDFS-10808
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-10808.001.patch, HDFS-10808.002.patch
>
>
> This is to redo of the fix in HDFS-10598



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10742) Measurement of lock held time in FsDatasetImpl

2016-08-30 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10742:
--
Attachment: HDFS-10742.009.patch

Removed per-method measurement and aggregate stats to minimize overhead for 
cheap operations

> Measurement of lock held time in FsDatasetImpl
> --
>
> Key: HDFS-10742
> URL: https://issues.apache.org/jira/browse/HDFS-10742
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha2
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10742.001.patch, HDFS-10742.002.patch, 
> HDFS-10742.003.patch, HDFS-10742.004.patch, HDFS-10742.005.patch, 
> HDFS-10742.006.patch, HDFS-10742.007.patch, HDFS-10742.008.patch, 
> HDFS-10742.009.patch
>
>
> This JIRA proposes to measure the time the of lock of {{FsDatasetImpl}} is 
> held by a thread. Doing so will allow us to measure lock statistics.
> This can be done by extending the {{AutoCloseableLock}} lock object in 
> {{FsDatasetImpl}}. In the future we can also consider replacing the lock with 
> a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450558#comment-15450558
 ] 

Zhe Zhang commented on HDFS-10817:
--

Thanks [~xkrogen] for the work! Patch looks pretty good. A few minor issues:
# Some lines are longer than 80 chars. Also I think multi-line comments should 
be:
{code}
/**
 * blah blah
 */
{code}
# Can we unify the logic for read and write locking? I.e. {{lastUnlock}} and 
{{needReport}}
# {{readLockHeldTimeStamp.remove();}} doesn't look necessary?
# It's a little flaky to only assert the log message contains the method name. 
Can we do more precise assertion here?
# Retrospectively, we should probably consolidate both write and read lock 
time-keeping in the {{FSNamesystemLock}} class itself. Doesn't make sense for 
both {{writeLock}} and {{writeLockInterruptibly}} to do the check and count 
time. We should file a follow-on JIRA for that.
# Another nice follow-on is to start a number of threads, some with sleep 
(longer than the threshold) and some don't. Then just verify the output has the 
ones with sleep.


> Add Logging for Long-held NN Read Locks
> ---
>
> Key: HDFS-10817
> URL: https://issues.apache.org/jira/browse/HDFS-10817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Attachments: HDFS-10817.000.patch, HDFS-10817.001.patch
>
>
> Right now the Namenode will log when a write lock is held for a long time to 
> help tracks methods which are causing expensive delays. Let's do the same for 
> read locks since these operations may also be expensive/long and cause 
> delays. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-9392) Admins support for maintenance state

2016-08-30 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma reassigned HDFS-9392:
-

Assignee: Ming Ma

> Admins support for maintenance state
> 
>
> Key: HDFS-9392
> URL: https://issues.apache.org/jira/browse/HDFS-9392
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0
>
> Attachments: HDFS-9392-2.patch, HDFS-9392-3.patch, HDFS-9392-4.patch, 
> HDFS-9392.patch
>
>
> This is to allow admins to put nodes into maintenance state with optional 
> timeout value as well as take nodes out of maintenance state. Likely we will 
> leverage what we come up in https://issues.apache.org/jira/browse/HDFS-9005.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-10817:
---
Attachment: HDFS-10817.001.patch

Uploaded new patch that doesn't use semaphores; forgot about Java's 
CountDownLatch which is cleaner (used in testing).

> Add Logging for Long-held NN Read Locks
> ---
>
> Key: HDFS-10817
> URL: https://issues.apache.org/jira/browse/HDFS-10817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Attachments: HDFS-10817.000.patch, HDFS-10817.001.patch
>
>
> Right now the Namenode will log when a write lock is held for a long time to 
> help tracks methods which are causing expensive delays. Let's do the same for 
> read locks since these operations may also be expensive/long and cause 
> delays. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10784) Implement WebHdfsFileSystem#listStatusIterator

2016-08-30 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450440#comment-15450440
 ] 

Xiao Chen commented on HDFS-10784:
--

Thanks for the new patch, Andrew.

Checkstyle is not related, +1 pending the whitespace change.

> Implement WebHdfsFileSystem#listStatusIterator
> --
>
> Key: HDFS-10784
> URL: https://issues.apache.org/jira/browse/HDFS-10784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.6.4
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HDFS-10784.001.patch, HDFS-10784.002.patch, 
> HDFS-10784.003.patch
>
>
> It would be nice to implement the iterative listStatus in WebHDFS so client 
> apps do not need to buffer the full file list for large directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-30 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450435#comment-15450435
 ] 

Fenghua Hu commented on HDFS-10804:
---

[~vagarychen], thanks for the suggestion.
My intention is to avoid using FsDatasetImpl for synchronization for 
performance, so a private object is introduced. But I am not quite sure if we 
CAN DO this. What do you think?


> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Attachments: HDFS-10804-001.patch, HDFS-10804-002.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9038) DFS reserved space is erroneously counted towards non-DFS used.

2016-08-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450427#comment-15450427
 ] 

Arpit Agarwal commented on HDFS-9038:
-

Thanks for sticking with this difficult fix [~brahmareddy]. The patch looks 
good to me (I am still reviewing unit tests). Nitpick - 
{{convert(DatanodeInfoProto di)}} can use the new {{DataNodeInfo}} constructor.

[~vinayrpet], [~cnauroth], [~szetszwo], do you have any comments on the latest 
patch.

> DFS reserved space is erroneously counted towards non-DFS used.
> ---
>
> Key: HDFS-9038
> URL: https://issues.apache.org/jira/browse/HDFS-9038
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Brahma Reddy Battula
> Attachments: GetFree.java, HDFS-9038-002.patch, HDFS-9038-003.patch, 
> HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038-006.patch, 
> HDFS-9038-007.patch, HDFS-9038-008.patch, HDFS-9038-009.patch, 
> HDFS-9038-010.patch, HDFS-9038.patch
>
>
> HDFS-5215 changed the DataNode volume available space calculation to consider 
> the reserved space held by the {{dfs.datanode.du.reserved}} configuration 
> property.  As a side effect, reserved space is now counted towards non-DFS 
> used.  I don't believe it was intentional to change the definition of non-DFS 
> used.  This issue proposes restoring the prior behavior: do not count 
> reserved space towards non-DFS used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450414#comment-15450414
 ] 

Erik Krogen commented on HDFS-10817:


Added a patch to implement this with a configurable threshold. Required the use 
of ThreadLocal since multiple threads acquire read locks simultaneously. 

> Add Logging for Long-held NN Read Locks
> ---
>
> Key: HDFS-10817
> URL: https://issues.apache.org/jira/browse/HDFS-10817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Attachments: HDFS-10817.000.patch
>
>
> Right now the Namenode will log when a write lock is held for a long time to 
> help tracks methods which are causing expensive delays. Let's do the same for 
> read locks since these operations may also be expensive/long and cause 
> delays. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-10817:
---
Attachment: HDFS-10817.000.patch

> Add Logging for Long-held NN Read Locks
> ---
>
> Key: HDFS-10817
> URL: https://issues.apache.org/jira/browse/HDFS-10817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Attachments: HDFS-10817.000.patch
>
>
> Right now the Namenode will log when a write lock is held for a long time to 
> help tracks methods which are causing expensive delays. Let's do the same for 
> read locks since these operations may also be expensive/long and cause 
> delays. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-10817:
---
Status: Patch Available  (was: In Progress)

> Add Logging for Long-held NN Read Locks
> ---
>
> Key: HDFS-10817
> URL: https://issues.apache.org/jira/browse/HDFS-10817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>
> Right now the Namenode will log when a write lock is held for a long time to 
> help tracks methods which are causing expensive delays. Let's do the same for 
> read locks since these operations may also be expensive/long and cause 
> delays. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10655) Fix path related byte array conversion bugs

2016-08-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-10655:
-
Attachment: HDFS-10655-branch-2.7.patch

branch-2.7 patch attached.

> Fix path related byte array conversion bugs
> ---
>
> Key: HDFS-10655
> URL: https://issues.apache.org/jira/browse/HDFS-10655
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-10655-branch-2.7.patch, HDFS-10655.patch, 
> HDFS-10655.patch
>
>
> {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple 
> separators, nor does it handle relative paths correctly.
> {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless 
> the specified range is the entire component array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10655) Fix path related byte array conversion bugs

2016-08-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-10655:
-
Status: Patch Available  (was: Reopened)

> Fix path related byte array conversion bugs
> ---
>
> Key: HDFS-10655
> URL: https://issues.apache.org/jira/browse/HDFS-10655
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-10655-branch-2.7.patch, HDFS-10655.patch, 
> HDFS-10655.patch
>
>
> {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple 
> separators, nor does it handle relative paths correctly.
> {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless 
> the specified range is the entire component array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10655) Fix path related byte array conversion bugs

2016-08-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-10655:
--

Sorry to reopen it, I want to backport it to branch-2.7 and have a full Jenkins 
run before doing that.

> Fix path related byte array conversion bugs
> ---
>
> Key: HDFS-10655
> URL: https://issues.apache.org/jira/browse/HDFS-10655
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-10655-branch-2.7.patch, HDFS-10655.patch, 
> HDFS-10655.patch
>
>
> {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple 
> separators, nor does it handle relative paths correctly.
> {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless 
> the specified range is the entire component array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10818) KerberosAuthenticationHandler#authenticate should not rebuild SPN based on client request

2016-08-30 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-10818:
-

 Summary: KerberosAuthenticationHandler#authenticate should not 
rebuild SPN based on client request
 Key: HDFS-10818
 URL: https://issues.apache.org/jira/browse/HDFS-10818
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


In KerberosAuthenticationHandler#authenticate, we use canonicalized server name 
derived from HTTP request to build server SPN and authenticate client. This can 
be problematic if the HTTP client/server are running from a non-local Kerberos 
realm that the local realm has trust with (e.g., NN UI).

For example, 
The server is running its HTTP endpoint using SPN from the client realm:
hadoop.http.authentication.kerberos.principal
HTTP/_HOST/TEST.COM

When client sends request to namenode at example@example.com with 
http://NN.example.com:50070 from somehost.test@test.com.

The client talks to KDC first and gets a service ticket 
HTTP/NN1.example.com/TEST.COM to authenticate with the server via SPNEGO 
negotiation. 

The authentication will end up with either no valid credential error or 
checksum failure depending on the HTTP client naming resolution or HTTP header 
of Host specified by the browser. 

The root cause is KerberosUtil.getServicePrincipal("HTTP", serverName)}} will 
return a SPN with local realm (HTTP/nn.example@example.com)  no matter the 
server login SPN is from that domain or not. 

The proposed fix is to change to use default server login principle (by passing 
null as the 1st parameter to gssManager.createCredential()) instead. This way 
we avoid dependency on HTTP client behavior (Host header or name resolution 
like CNAME) or assumption on the local realm. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8901) Use ByteBuffer in striping positional read

2016-08-30 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450224#comment-15450224
 ] 

Kai Zheng commented on HDFS-8901:
-

Hey [~zhz],

Do you think [~Sammi] and the latest patch addressed your questions? I'm 
wondering if we could get this in soon and then proceed on another one 
HDFS-8957, to catch up with some release. Thanks!

> Use ByteBuffer in striping positional read
> --
>
> Key: HDFS-8901
> URL: https://issues.apache.org/jira/browse/HDFS-8901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: SammiChen
> Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, 
> HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, 
> HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, 
> HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, 
> HDFS-8901.v13.patch, HDFS-8901.v14.patch, HDFS-8901.v15.patch, 
> HDFS-8901.v16.patch, initial-poc.patch
>
>
> Native erasure coder prefers to direct ByteBuffer for performance 
> consideration. To prepare for it, this change uses ByteBuffer through the 
> codes in implementing striping position read. It will also fix avoiding 
> unnecessary data copying between striping read chunk buffers and decode input 
> buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-9392) Admins support for maintenance state

2016-08-30 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma resolved HDFS-9392.
---
Resolution: Fixed

> Admins support for maintenance state
> 
>
> Key: HDFS-9392
> URL: https://issues.apache.org/jira/browse/HDFS-9392
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
> Fix For: 2.9.0
>
> Attachments: HDFS-9392-2.patch, HDFS-9392-3.patch, HDFS-9392-4.patch, 
> HDFS-9392.patch
>
>
> This is to allow admins to put nodes into maintenance state with optional 
> timeout value as well as take nodes out of maintenance state. Likely we will 
> leverage what we come up in https://issues.apache.org/jira/browse/HDFS-9005.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-9392) Admins support for maintenance state

2016-08-30 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma reopened HDFS-9392:
---

> Admins support for maintenance state
> 
>
> Key: HDFS-9392
> URL: https://issues.apache.org/jira/browse/HDFS-9392
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
> Fix For: 2.9.0
>
> Attachments: HDFS-9392-2.patch, HDFS-9392-3.patch, HDFS-9392-4.patch, 
> HDFS-9392.patch
>
>
> This is to allow admins to put nodes into maintenance state with optional 
> timeout value as well as take nodes out of maintenance state. Likely we will 
> leverage what we come up in https://issues.apache.org/jira/browse/HDFS-9005.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9392) Admins support for maintenance state

2016-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450186#comment-15450186
 ] 

Hudson commented on HDFS-9392:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10376 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10376/])
HDFS-9392. Admins support for maintenance state. Contributed by Ming Ma. 
(mingma: rev 9dcbdbdb5a34d85910707f81ebc1bb1f81c99978)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMBean.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AdminStatesBaseTest.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/resources/dfs.hosts.json
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestMaintenanceState.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestCombinedHostsFileReader.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStats.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/HostsFileWriter.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeAdminProperties.java


> Admins support for maintenance state
> 
>
> Key: HDFS-9392
> URL: https://issues.apache.org/jira/browse/HDFS-9392
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
> Fix For: 2.9.0
>
> Attachments: HDFS-9392-2.patch, HDFS-9392-3.patch, HDFS-9392-4.patch, 
> HDFS-9392.patch
>
>
> This is to allow admins to put nodes into maintenance state with optional 
> timeout value as well as take nodes out of maintenance state. Likely we will 
> leverage what we come up in https://issues.apache.org/jira/browse/HDFS-9005.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9392) Admins support for maintenance state

2016-08-30 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9392:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

Thanks [~eddyxu] for the review! I have committed it to trunk and branch-2.

> Admins support for maintenance state
> 
>
> Key: HDFS-9392
> URL: https://issues.apache.org/jira/browse/HDFS-9392
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
> Fix For: 2.9.0
>
> Attachments: HDFS-9392-2.patch, HDFS-9392-3.patch, HDFS-9392-4.patch, 
> HDFS-9392.patch
>
>
> This is to allow admins to put nodes into maintenance state with optional 
> timeout value as well as take nodes out of maintenance state. Likely we will 
> leverage what we come up in https://issues.apache.org/jira/browse/HDFS-9005.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9953) Download File from UI broken after pagination

2016-08-30 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450131#comment-15450131
 ] 

Ravi Prakash commented on HDFS-9953:


Yupp. Sorry about that. I forgot initially and then didn't want to rewrite 
history :(

> Download File from UI broken after pagination
> -
>
> Key: HDFS-9953
> URL: https://issues.apache.org/jira/browse/HDFS-9953
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9953.patch
>
>
>  File links not working on second page onwards. this was introduced in 
> HDFS-9084.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450042#comment-15450042
 ] 

Rushabh S Shah commented on HDFS-10816:
---

Forgot to mention +1 (non-binding)

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450035#comment-15450035
 ] 

Rushabh S Shah commented on HDFS-10816:
---

[~ebadger]: Thanks for reporting and analyzing the failure.
This test broke in our internal build recently.
Below are the relevant logs:
{noformat}
2016-08-29 01:54:49,332 INFO  impl.RamDiskAsyncLazyPersistService 
(RamDiskAsyncLazyPersistService.java:shutdown(169)) - All async lazy persist 
service threads have been shut down
2016-08-29 01:54:49,336 INFO  datanode.DataNode (DataNode.java:shutdown(1791)) 
- Shutdown complete.
2016-08-29 01:54:49,347 INFO  BlockStateChange 
(BlockManager.java:addToInvalidates(1228)) - BLOCK* addToInvalidates: 
blk_1073741825_1001 127.0.0.1:57662 127.0.0.1:43137 127.0.0.1:59637 
2016-08-29 01:54:49,349 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditMessage(8476)) - allowed=true   ugi=tortuga 
(auth:SIMPLE)   ip=/127.0.0.1   cmd=delete  src=/testRR dst=null
perm=null   proto=rpc
2016-08-29 01:54:49,350 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3582)) - BLOCK* BlockManager: ask 
127.0.0.1:59637 to delete [blk_1073741825_1001]
2016-08-29 01:54:49,355 INFO  hdfs.MiniDFSCluster 
(MiniDFSCluster.java:shutdown(1725)) - Shutting down the Mini HDFS Cluster
{noformat}

bq. 2016-08-29 01:54:49,336 INFO  datanode.DataNode 
(DataNode.java:shutdown(1791)) - Shutdown complete.
This line corresponds to shutting down the last datanode.
bq. 2016-08-29 01:54:49,347 INFO  BlockStateChange 
(BlockManager.java:addToInvalidates(1228)) - BLOCK* addToInvalidates: 
blk_1073741825_1001 127.0.0.1:57662 127.0.0.1:43137 127.0.0.1:59637 
After stopping the last datanode, I can see the InvalidateBlocks size is 3.
bq. 2016-08-29 01:54:49,350 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3582)) - BLOCK* BlockManager: ask 
127.0.0.1:59637 to delete \[blk_1073741825_1001\]
Then the replication monitor woke up and removed one block from the 
invalidateBlocks set 

I think the test was checking the invalidateBlock size just after the 
replication monitor computed invalidate work for one node and that failed.
I think stopping the replication monitor is the correct fix.

[~jojochuang], [~zhz]: Since you reviewed HDFS-9580, can you please help 
reviewing this patch.

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10742) Measurement of lock held time in FsDatasetImpl

2016-08-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449980#comment-15449980
 ] 

Arpit Agarwal commented on HDFS-10742:
--

Hi [~vagarychen], a couple of comments:
# {{check()}} should have minimal overhead if the lock was not held for more 
lockWarningThresholdMs which will be the common case. We can simplify the patch 
quite a bit and just add logging along with the caller stack trace if the lock 
was held for too long. We can extend it later if we want to add performance 
statistics but we'd have to do it in a way that avoids object allocations.
# InstrumentedLock should probably not extend the concrete class 
AutoCloseableLock. We can make InstrumentedLock a separate class. or 
alternatively AutoCloseableLock a Java interface with multiple implementations.

bq. but in the context of a patch generating thread stacks and maintaining 
persistent maps of timestamps, it's absurd to debate any of this.
Yes I agree. We should avoid that overhead.

> Measurement of lock held time in FsDatasetImpl
> --
>
> Key: HDFS-10742
> URL: https://issues.apache.org/jira/browse/HDFS-10742
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha2
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10742.001.patch, HDFS-10742.002.patch, 
> HDFS-10742.003.patch, HDFS-10742.004.patch, HDFS-10742.005.patch, 
> HDFS-10742.006.patch, HDFS-10742.007.patch, HDFS-10742.008.patch
>
>
> This JIRA proposes to measure the time the of lock of {{FsDatasetImpl}} is 
> held by a thread. Doing so will allow us to measure lock statistics.
> This can be done by extending the {{AutoCloseableLock}} lock object in 
> {{FsDatasetImpl}}. In the future we can also consider replacing the lock with 
> a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6532) Intermittent test failure org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt

2016-08-30 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449965#comment-15449965
 ] 

Xiao Chen commented on HDFS-6532:
-

Thanks Yiqun for working on this. It does look like we can reuse the 
{{closeResponder}} method in the loop, but I don't think that's the root cause 
here.

Taking the failure log in attachment as an example, the test is supposed to end 
quickly (around 15:41:58) after 5 times failure on checksum error. But somehow 
it did not, and hangs there until the 50 seconds test timeout is reached. After 
test timeout, junit interrupts all threads which is what we see in the last 3 
messages (around 15:42:43).

I looked into this too, and still think this is some error on triggering / 
handling the interrupt after the 5th checksum error. Don't have any concrete 
progress though.
{noformat}
2016-08-20 15:41:58,084 INFO  datanode.DataNode 
(DataXceiver.java:writeBlock(835)) - opWriteBlock 
BP-1703495320-172.17.0.1-1471707714371:blk_1073741826_1005 received exception 
java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
Unexpected checksum mismatch while writing 
BP-1703495320-172.17.0.1-1471707714371:blk_1073741826_1005 from /127.0.0.1:49059
2016-08-20 15:41:58,084 ERROR datanode.DataNode (DataXceiver.java:run(273)) - 
127.0.0.1:52977:DataXceiver error processing WRITE_BLOCK operation  src: 
/127.0.0.1:49059 dst: /127.0.0.1:52977
java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
Unexpected checksum mismatch while writing 
BP-1703495320-172.17.0.1-1471707714371:blk_1073741826_1005 from /127.0.0.1:49059
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:606)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:896)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:802)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:745)
2016-08-20 15:41:58,258 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3667)) - BLOCK* BlockManager: ask 
127.0.0.1:51819 to delete [blk_1073741825_1002]
2016-08-20 15:41:58,258 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3667)) - BLOCK* BlockManager: ask 
127.0.0.1:39731 to delete [blk_1073741825_1002]
2016-08-20 15:41:58,258 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3667)) - BLOCK* BlockManager: ask 
127.0.0.1:52977 to delete [blk_1073741825_1002]
2016-08-20 15:41:59,235 INFO  BlockStateChange (InvalidateBlocks.java:add(116)) 
- BLOCK* InvalidateBlocks: add blk_1073741825_1001 to 127.0.0.1:49498
2016-08-20 15:41:59,238 INFO  impl.FsDatasetAsyncDiskService 
(FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling 
blk_1073741825_1002 file 
/tmp/run_tha_test5KJcML/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data5/current/BP-1703495320-172.17.0.1-1471707714371/current/finalized/subdir0/subdir0/blk_1073741825
 for deletion
2016-08-20 15:41:59,240 INFO  impl.FsDatasetAsyncDiskService 
(FsDatasetAsyncDiskService.java:run(295)) - Deleted 
BP-1703495320-172.17.0.1-1471707714371 blk_1073741825_1002 file 
/tmp/run_tha_test5KJcML/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data5/current/BP-1703495320-172.17.0.1-1471707714371/current/finalized/subdir0/subdir0/blk_1073741825
2016-08-20 15:41:59,378 INFO  impl.FsDatasetAsyncDiskService 
(FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling 
blk_1073741825_1002 file 
/tmp/run_tha_test5KJcML/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data9/current/BP-1703495320-172.17.0.1-1471707714371/current/finalized/subdir0/subdir0/blk_1073741825
 for deletion
2016-08-20 15:41:59,378 INFO  impl.FsDatasetAsyncDiskService 
(FsDatasetAsyncDiskService.java:run(295)) - Deleted 
BP-1703495320-172.17.0.1-1471707714371 blk_1073741825_1002 file 
/tmp/run_tha_test5KJcML/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data9/current/BP-1703495320-172.17.0.1-1471707714371/current/finalized/subdir0/subdir0/blk_1073741825
2016-08-20 15:41:59,698 INFO  impl.FsDatasetAsyncDiskService 
(FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling 
blk_1073741825_1002 file 
/tmp/run_tha_test5KJcML/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data17/current/BP-1703495320-172.17.0.1-1471707714371/current/finalized/subdir0/subdir0/blk_1073741825
 for deletion
2016-08-20 15:41:59,698 INFO  impl.FsDatasetAsyncDiskService 
(FsDatasetAsyncDiskService.java:run(295)) - Deleted 
BP-1703495320-172.17.0.1-1471707714371 blk_1073741825_1002 file 

[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded

2016-08-30 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449921#comment-15449921
 ] 

Wei-Chiu Chuang commented on HDFS-10729:


All failed tests passed locally. [~kihwal] would you like to take a look again? 
Thanks

> NameNode crashes when loading edits because max directory items is exceeded
> ---
>
> Key: HDFS-10729
> URL: https://issues.apache.org/jira/browse/HDFS-10729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Critical
> Attachments: HDFS-10729.001.patch
>
>
> We encountered a bug where Standby NameNode crashes due to an NPE when 
> loading edits.
> {noformat}
> 2016-08-05 15:06:00,983 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, 
> mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], 
> permissions=:supergroup:rw-r--r--, aclEntries=null, 
> clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, 
> overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, 
> RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> {noformat}
> The NameNode crashes and can not be restarted. After some research, we turned 
> on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we 
> saw the following exception which induced NPE:
> {noformat}
> 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedAddFile: exception when add [path] to the file system
> org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException:
>  The directory item limit of [path] is exceeded: limit=1048576 items=1049332
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> 

[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long

2016-08-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449865#comment-15449865
 ] 

Zhe Zhang commented on HDFS-9145:
-

Thanks [~kihwal] for noticing this. But are we still using CHANGES.txt to keep 
track of branch-2.7 changes? This JIRA is a special case because when it went 
in branch-2 it had a CHANGES.txt entry. But for all new JIRAs targeting 2.7.4 
we are no longer putting in CHANGES.txt entries anyway.

> Tracking methods that hold FSNamesytemLock for too long
> ---
>
> Key: HDFS-9145
> URL: https://issues.apache.org/jira/browse/HDFS-9145
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Jing Zhao
>Assignee: Mingliang Liu
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, 
> HDFS-9145.002.patch, HDFS-9145.003.patch, testlog.txt
>
>
> It will be helpful that if we can have a way to track (or at least log a msg) 
> if some operation is holding the FSNamesystem lock for a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10799) NameNode should use loginUser(hdfs) to serve iNotify requests

2016-08-30 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449857#comment-15449857
 ] 

Wei-Chiu Chuang commented on HDFS-10799:


Correct me if I am wrong, but wouldn't ugi.doAs() in Server$Handler.run at 
NameNode throw an PriviledgedActionException if the client expires?

> NameNode should use loginUser(hdfs) to serve iNotify requests
> -
>
> Key: HDFS-10799
> URL: https://issues.apache.org/jira/browse/HDFS-10799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: Kerberized, HA cluster, iNotify client, CDH5.7.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10799.001.patch
>
>
> When a NameNode serves iNotify requests from a client, it verifies the client 
> has superuser permission and then uses the client's Kerberos principal to 
> read edits from journal nodes.
> However, if the client does not renew its tgt tickets, the connection from 
> NameNode to journal nodes may fail. In which case, the NameNode thinks the 
> edits are corrupt, and prints a scary error message:
> "During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 11577603, but we thought we could read up to 
> transaction 11577606.  If you continue, metadata will be lost forever!"
> However, the edits are actually good. NameNode _should not freak out when an 
> iNotify client's tgt ticket expires_.
> I think that an easy solution to this bug, is that after NameNode verifies 
> client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then 
> read edits. This will make sure the operation does not fail due to an expired 
> client ticket.
> Excerpt of related logs:
> {noformat}
> 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:h...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: We encountered an error reading 
> http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy,
>  
> http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 11577603, but we thought we could read up to 
> transaction 11577606.  If you continue, metadata will be lost forever!
> 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 112 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client 
> IP:port] Call#73 Retry#0
> java.io.IOException: We encountered an error reading 
> http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy,
>  
> http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 11577603, but we thought we could read up to 
> transaction 11577606.  If you continue, metadata will be lost forever!
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
> at 

[jira] [Commented] (HDFS-10809) getNumEncryptionZones causes NPE in branch-2.7

2016-08-30 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449841#comment-15449841
 ] 

Vinitha Reddy Gankidi commented on HDFS-10809:
--

Thanks [~zhz]. I could not reproduce the test failures locally as well. 

> getNumEncryptionZones causes NPE in branch-2.7
> --
>
> Key: HDFS-10809
> URL: https://issues.apache.org/jira/browse/HDFS-10809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.7.4
>Reporter: Zhe Zhang
>Assignee: Vinitha Reddy Gankidi
> Fix For: 2.7.4
>
> Attachments: HDFS-10809-branch-2.7.001.patch
>
>
> This bug was caused by the fact that we did HDFS-10458 from trunk to 
> branch-2.7, but we did HDFS-8721 initially up to branch-2.8. So from 
> branch-2.8 and up, the order is HDFS-8721 -> HDFS-10458. But in branch-2.7, 
> we have the reverse order. Hence the inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2016-08-30 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449824#comment-15449824
 ] 

Roman Shaposhnik commented on HDFS-6994:


Note to everybody, but most recently [~sga] this code is now being maintained 
as part of the Apache HAWQ (incubating) 
https://github.com/apache/incubator-hawq/tree/master/depends/libhdfs3 the 
original (Pivotal) repo is now obsolete. Also note that there's a YARN library 
as well should be interested in checking it out 
https://github.com/apache/incubator-hawq/tree/master/depends/libyarn

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3
> http://pivotal-data-attic.github.io/pivotalrd-libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10784) Implement WebHdfsFileSystem#listStatusIterator

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449812#comment-15449812
 ] 

Hadoop QA commented on HDFS-10784:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
6s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | {color:orange} hadoop-hdfs-project: The patch generated 4 new + 
252 unchanged - 3 fixed = 256 total (was 255) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
56s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 63m 
41s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 93m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-10784 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12826210/HDFS-10784.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d0bb0ac586ee 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / af50860 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16578/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16578/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16578/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client 

[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2016-08-30 Thread Stephen G Ahlgren (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449810#comment-15449810
 ] 

Stephen G Ahlgren commented on HDFS-6994:
-

Thank you for the note Zhanwei.  I've become familiar with libhdfs3 on Linux 
and would like to check in with you about your thoughts for two branches 
(hadoop hdfs-6994 branch v. the pivotal apache-rpc-9 branch) before committing 
any changes.

There are Windows-specific changes in the hdfs-6994 branch that aren't 
reflected in the pivotal code and namespace/method naming changes between the 
two as well.  Should we try and reconcile the two branches or focus efforts on 
the pivotal apache-rpc-9 branch, moving Windows-specific code in as needed from 
elsewhere?

Thank you kindly in advance, I just want to make sure we're on the same page 
before trundling forward.

Best,
Steve A.


> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3
> http://pivotal-data-attic.github.io/pivotalrd-libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10809) getNumEncryptionZones causes NPE in branch-2.7

2016-08-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-10809:
-
   Resolution: Fixed
Fix Version/s: 2.7.4
   Status: Resolved  (was: Patch Available)

Just committed the patch. Thanks [~redvine] for the contribution.

> getNumEncryptionZones causes NPE in branch-2.7
> --
>
> Key: HDFS-10809
> URL: https://issues.apache.org/jira/browse/HDFS-10809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.7.4
>Reporter: Zhe Zhang
>Assignee: Vinitha Reddy Gankidi
> Fix For: 2.7.4
>
> Attachments: HDFS-10809-branch-2.7.001.patch
>
>
> This bug was caused by the fact that we did HDFS-10458 from trunk to 
> branch-2.7, but we did HDFS-8721 initially up to branch-2.8. So from 
> branch-2.8 and up, the order is HDFS-8721 -> HDFS-10458. But in branch-2.7, 
> we have the reverse order. Hence the inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10809) getNumEncryptionZones causes NPE in branch-2.7

2016-08-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-10809:
-
Hadoop Flags: Reviewed

Thanks Vinitha. +1 on the patch. Reported test failures are not related and I 
verified that none can be reproduced locally. I'm committing the patch to 
branch-2.7 soon.

> getNumEncryptionZones causes NPE in branch-2.7
> --
>
> Key: HDFS-10809
> URL: https://issues.apache.org/jira/browse/HDFS-10809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.7.4
>Reporter: Zhe Zhang
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10809-branch-2.7.001.patch
>
>
> This bug was caused by the fact that we did HDFS-10458 from trunk to 
> branch-2.7, but we did HDFS-8721 initially up to branch-2.8. So from 
> branch-2.8 and up, the order is HDFS-8721 -> HDFS-10458. But in branch-2.7, 
> we have the reverse order. Hence the inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10713) Throttle FsNameSystem lock warnings

2016-08-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449776#comment-15449776
 ] 

Arpit Agarwal commented on HDFS-10713:
--

Thanks for the patch [~hanishakoneru].
# We can do the timestamp checks just before the 
{{this.fsLock.writeLock().unlock()}} call, so the new state is protected by the 
write lock. The actual logging can be done outside the write lock though. Then 
you don't need to make the members volatile anymore.
# We can cache the value returned by {{monotonicNow()}} instead of calling it 
twice.


> Throttle FsNameSystem lock warnings
> ---
>
> Key: HDFS-10713
> URL: https://issues.apache.org/jira/browse/HDFS-10713
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging, namenode
>Reporter: Arpit Agarwal
>Assignee: Hanisha Koneru
> Attachments: HDFS-10713.000.patch
>
>
> The NameNode logs a message if the FSNamesystem write lock is held by a 
> thread for over 1 second. These messages can be throttled to at one most one 
> per x minutes to avoid potentially filling up NN logs. We can also log the 
> number of suppressed notices since the last log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449734#comment-15449734
 ] 

Hudson commented on HDFS-10760:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10375 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10375/])
HDFS-10760. DataXceiver#run() should not log InvalidToken exception as 
(weichiu: rev c4ee6915a14e00342755d7cdcbf2d61518f306aa)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java


> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13492.patch, HDFS-10760-1.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.
> A simple fix by catching the InvalidToken exception in DataXceiver#run(), 
> only keeping the warning logged by DataXceiver#checkAccess() in the DN log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-30 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10760:
---
Fix Version/s: 3.0.0-alpha2

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13492.patch, HDFS-10760-1.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.
> A simple fix by catching the InvalidToken exception in DataXceiver#run(), 
> only keeping the warning logged by DataXceiver#checkAccess() in the DN log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-30 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10760:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed it to trunk, branch-2 and branch-2.8.
Thanks [~panyuxuan] for contributing the patch  and [~brahmareddy] and 
[~shahrs87] for comments!

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Fix For: 2.8.0
>
> Attachments: HADOOP-13492.patch, HDFS-10760-1.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.
> A simple fix by catching the InvalidToken exception in DataXceiver#run(), 
> only keeping the warning logged by DataXceiver#checkAccess() in the DN log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-30 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10760:
---
Hadoop Flags: Reviewed
Release Note: Log InvalidTokenException at debug level in DataXceiver#run().

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HADOOP-13492.patch, HDFS-10760-1.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.
> A simple fix by catching the InvalidToken exception in DataXceiver#run(), 
> only keeping the warning logged by DataXceiver#checkAccess() in the DN log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-30 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10760:
---
Release Note: Log InvalidTokenException at trace level in 
DataXceiver#run().  (was: Log InvalidTokenException at debug level in 
DataXceiver#run().)

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HADOOP-13492.patch, HDFS-10760-1.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.
> A simple fix by catching the InvalidToken exception in DataXceiver#run(), 
> only keeping the warning logged by DataXceiver#checkAccess() in the DN log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449672#comment-15449672
 ] 

Eric Badger commented on HDFS-10816:


The test failure is unrelated to the patch

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449652#comment-15449652
 ] 

Hadoop QA commented on HDFS-10816:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 21s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestPersistBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-10816 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12826196/HDFS-10816.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0538865c37d7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / af50860 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16577/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16577/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16577/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger

[jira] [Commented] (HDFS-10814) Add assertion for getNumEncryptionZones when no EZ is created

2016-08-30 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449598#comment-15449598
 ] 

Vinitha Reddy Gankidi commented on HDFS-10814:
--

Thanks Zhe and Andrew!

> Add assertion for getNumEncryptionZones when no EZ is created
> -
>
> Key: HDFS-10814
> URL: https://issues.apache.org/jira/browse/HDFS-10814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-10814.001.patch
>
>
> HDFS-10809 adds an additional assertion to TestEncryptionZones to validate 
> that getNumEncryptionZones returns 0 if there is no EZ. This is a useful 
> check to add to trunk as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-30 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449549#comment-15449549
 ] 

Chen Liang commented on HDFS-10804:
---

Hi [~fenghua_hu], thanks for the work!

Instead of having a new constructor without parameter and replacing in 
{{FsDatasetImpl}}, would you consider having a constructor that accepts a 
AutoCloseable lock object and lock on that for {{FsDatasetImpl}}? This way 
{{ReplicaMap}} can still synchronize with {{FsDatasetImpl}}.

> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Attachments: HDFS-10804-001.patch, HDFS-10804-002.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10784) Implement WebHdfsFileSystem#listStatusIterator

2016-08-30 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-10784:
---
Attachment: HDFS-10784.003.patch

Thanks for reviewing Xiao. Yep, we should hit HttpFS also, but if you don't 
mind, I'd like to do it in a separate JIRA. Feedback is otherwise addressed.

> Implement WebHdfsFileSystem#listStatusIterator
> --
>
> Key: HDFS-10784
> URL: https://issues.apache.org/jira/browse/HDFS-10784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.6.4
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HDFS-10784.001.patch, HDFS-10784.002.patch, 
> HDFS-10784.003.patch
>
>
> It would be nice to implement the iterative listStatus in WebHDFS so client 
> apps do not need to buffer the full file list for large directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-30 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449489#comment-15449489
 ] 

Wei-Chiu Chuang commented on HDFS-10760:


Thanks for correcting me. You're right about that.
+1.

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HADOOP-13492.patch, HDFS-10760-1.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.
> A simple fix by catching the InvalidToken exception in DataXceiver#run(), 
> only keeping the warning logged by DataXceiver#checkAccess() in the DN log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10813) DiskBalancer: Add the getNodeList method in Command

2016-08-30 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449485#comment-15449485
 ] 

Anu Engineer commented on HDFS-10813:
-

[~linyiqun] Thanks for the patch. I will commit this to trunk. But before I do 
that, could you please file a document JIRA that explains how this feature 
works. Since the change is the CLI it is good to have the documentation updated 
too. As I said you can easily do that in a separate JIRA if you like.

> DiskBalancer: Add the getNodeList method in Command
> ---
>
> Key: HDFS-10813
> URL: https://issues.apache.org/jira/browse/HDFS-10813
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HDFS-10813.001.patch
>
>
> The method {{Command#getNodeList}} in DiskBalancer was added in HDFS-9545, 
> but it's never used. We can improve that in the following aspects:
> 1.Change {{private}} to {{protected}} so that the subclass can use that 
> method in the future.
> 2.Reuse the method {{Command#getNodeList}} and to construct a new method
> like this {{List getNodes(String listArg)}}. This 
> method can be used for getting multiple nodes in the future. For example, if 
> we want to use {{hdfs diskbalancer -report -node}} or {{hdfs diskbalancer 
> -plan}} with multiple specified nodes, that method can be used. Now these 
> commands only support one node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-10817 started by Erik Krogen.
--
> Add Logging for Long-held NN Read Locks
> ---
>
> Key: HDFS-10817
> URL: https://issues.apache.org/jira/browse/HDFS-10817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>
> Right now the Namenode will log when a write lock is held for a long time to 
> help tracks methods which are causing expensive delays. Let's do the same for 
> read locks since these operations may also be expensive/long and cause 
> delays. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10817) Add Logging for Long-held NN Read Locks

2016-08-30 Thread Erik Krogen (JIRA)
Erik Krogen created HDFS-10817:
--

 Summary: Add Logging for Long-held NN Read Locks
 Key: HDFS-10817
 URL: https://issues.apache.org/jira/browse/HDFS-10817
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: logging, namenode
Reporter: Erik Krogen
Assignee: Erik Krogen


Right now the Namenode will log when a write lock is held for a long time to 
help tracks methods which are causing expensive delays. Let's do the same for 
read locks since these operations may also be expensive/long and cause delays. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10814) Add assertion for getNumEncryptionZones when no EZ is created

2016-08-30 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-10814:
---
Fix Version/s: (was: 3.0.0-alpha1)
   3.0.0-alpha2

> Add assertion for getNumEncryptionZones when no EZ is created
> -
>
> Key: HDFS-10814
> URL: https://issues.apache.org/jira/browse/HDFS-10814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-10814.001.patch
>
>
> HDFS-10809 adds an additional assertion to TestEncryptionZones to validate 
> that getNumEncryptionZones returns 0 if there is no EZ. This is a useful 
> check to add to trunk as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-10816:
---
Attachment: HDFS-10816.001.patch

Attaching patch to stop the replication monitor in the @Before method of the 
test class. All of the tests are testing portions of the replication monitor 
code, so it would be prudent for the replication monitor to not run.

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10814) Add assertion for getNumEncryptionZones when no EZ is created

2016-08-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449389#comment-15449389
 ] 

Zhe Zhang commented on HDFS-10814:
--

Thanks Andrew for taking care of this.

> Add assertion for getNumEncryptionZones when no EZ is created
> -
>
> Key: HDFS-10814
> URL: https://issues.apache.org/jira/browse/HDFS-10814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-10814.001.patch
>
>
> HDFS-10809 adds an additional assertion to TestEncryptionZones to validate 
> that getNumEncryptionZones returns 0 if there is no EZ. This is a useful 
> check to add to trunk as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-10816:
---
Status: Patch Available  (was: Open)

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10814) Add assertion for getNumEncryptionZones when no EZ is created

2016-08-30 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449385#comment-15449385
 ] 

Andrew Wang commented on HDFS-10814:


I'm bumping this to 3.0.0-alpha2 fixVersion since I just sent out the 
3.0.0-alpha1 RC0, but we can pull it in if there's another RC. Thanks folks!

> Add assertion for getNumEncryptionZones when no EZ is created
> -
>
> Key: HDFS-10814
> URL: https://issues.apache.org/jira/browse/HDFS-10814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-10814.001.patch
>
>
> HDFS-10809 adds an additional assertion to TestEncryptionZones to validate 
> that getNumEncryptionZones returns 0 if there is no EZ. This is a useful 
> check to add to trunk as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Eric Badger (JIRA)
Eric Badger created HDFS-10816:
--

 Summary: TestComputeInvalidateWork#testDatanodeReRegistration 
fails due to race between test and replication monitor
 Key: HDFS-10816
 URL: https://issues.apache.org/jira/browse/HDFS-10816
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


{noformat}
java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
expected:<3> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
{noformat}

The test fails because of a race condition between the test and the replication 
monitor. The default replication monitor interval is 3 seconds, which is just 
about how long the test normally takes to run. The test deletes a file and then 
subsequently gets the namesystem writelock. However, if the replication monitor 
fires in between those two instructions, the test will fail as it will itself 
invalidate one of the blocks. This can be easily reproduced by removing the 
sleep() in the ReplicationMonitor's run() method in BlockManager.java, so that 
the replication monitor executes as quickly as possible and exacerbates the 
race. 

To fix the test all that needs to be done is to turn off the replication 
monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-30 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449359#comment-15449359
 ] 

Rushabh S Shah commented on HDFS-10760:
---

I also checked for the same thing when I was reviewing.
But it turns out that checkAccess method is not in the try block, so it will 
just throw the InvlaidToken Exception all the way back to DataXceiver#run.
So I think the patch should be good.
Correct me if I am wrong.


> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HADOOP-13492.patch, HDFS-10760-1.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.
> A simple fix by catching the InvalidToken exception in DataXceiver#run(), 
> only keeping the warning logged by DataXceiver#checkAccess() in the DN log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-30 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449340#comment-15449340
 ] 

Wei-Chiu Chuang edited comment on HDFS-10760 at 8/30/16 3:40 PM:
-

[~panyuxuan] thanks for reporting the issue and creating the patch.
In fact, DataXceiver#checkAccess() is used by multiple methods and I think it 
would be beneficial if you can also improve other callers as well.

For example, I can see DataXceiver#writeBlock calls checkAccess, and if an 
InvalidTokenException is thrown, it also logs a message at error level.

{code}
try {
...
} catch (IOException ioe) {
if (isClient) {
LOG.error(datanode + ":Exception transfering block " +
  block + " to mirror " + mirrorNode + ": " + e);
throw e;
  }
}
{code}


was (Author: jojochuang):
[~panyuxuan] thanks for reporting the issue and creating the patch.
In fact, DataXceiver#checkAccess() is used by multiple methods and I think it 
would be beneficial if you can also fix other as well.

For example, I can see DataXceiver#writeBlock calls checkAccess, and if an 
InvalidTokenException is thrown, it also logs a message at error level.

{code}
try {
...
} catch (IOException ioe) {
if (isClient) {
LOG.error(datanode + ":Exception transfering block " +
  block + " to mirror " + mirrorNode + ": " + e);
throw e;
  }
}
{code}

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HADOOP-13492.patch, HDFS-10760-1.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.
> A simple fix by catching the InvalidToken exception in DataXceiver#run(), 
> only keeping the warning logged by DataXceiver#checkAccess() in the DN log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-30 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449340#comment-15449340
 ] 

Wei-Chiu Chuang commented on HDFS-10760:


[~panyuxuan] thanks for reporting the issue and creating the patch.
In fact, DataXceiver#checkAccess() is used by multiple methods and I think it 
would be beneficial if you can also fix other as well.

For example, I can see DataXceiver#writeBlock calls checkAccess, and if an 
InvalidTokenException is thrown, it also logs a message at error level.

{code}
try {
...
} catch (IOException ioe) {
if (isClient) {
LOG.error(datanode + ":Exception transfering block " +
  block + " to mirror " + mirrorNode + ": " + e);
throw e;
  }
}
{code}

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HADOOP-13492.patch, HDFS-10760-1.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.
> A simple fix by catching the InvalidToken exception in DataXceiver#run(), 
> only keeping the warning logged by DataXceiver#checkAccess() in the DN log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-30 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449140#comment-15449140
 ] 

Rushabh S Shah commented on HDFS-10760:
---

[~panyuxuan]: Thanks for reporting the issue.
Patch looks good to me as well.
+1 non-binding.

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HADOOP-13492.patch, HDFS-10760-1.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.
> A simple fix by catching the InvalidToken exception in DataXceiver#run(), 
> only keeping the warning logged by DataXceiver#checkAccess() in the DN log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >