date:20160428

[jira] [Updated] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-28 Thread Lin Yiqun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10338:
-
Attachment: HDFS-10338.001.patch

Update the v001 patch for making some small changes.

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
> Attachments: HDFS-10338.001.patch
>
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-28 Thread Lin Yiqun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10338:
-
Attachment: (was: HDFS-10338.001.patch)

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9545) DiskBalancer : Add Plan Command

2016-04-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263504#comment-15263504
 ] 

Hadoop QA commented on HDFS-9545:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 24s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 4s 
{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
22s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} HDFS-1312 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s 
{color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} HDFS-1312 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 
8s {color} | {color:green} The applied patch generated 0 new + 92 unchanged - 1 
fixed = 92 total (was 93) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 40s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 49s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 184m 21s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed junit tests | 
hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   |

[jira] [Updated] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-28 Thread Lin Yiqun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10338:
-
Attachment: HDFS-10338.001.patch

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
> Attachments: HDFS-10338.001.patch
>
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-28 Thread Lin Yiqun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10338:
-
Status: Patch Available  (was: Open)

Attach a initial patch for this, thanks for review.

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
> Attachments: HDFS-10338.001.patch
>
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7269) NN and DN don't check whether corrupted blocks reported by clients are actually corrupted

2016-04-28 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263477#comment-15263477
 ] 

Ming Ma commented on HDFS-7269:
---

We might not need another datanode command, instead we can use the existing 
block transfer command to ask the suspect datanode to transfer the block to 
another datanode. The new target datanode is going to do checksum verification 
just like DFSClient and notify NN accordingly. NN only trusts the 
reportBadBlocks from datanode.

In addition, NN can decide to ignore some bad clients which keep incorrectly 
reporting corrupted replicas. In that way, NN can skip the unnecessary "have DN 
recheck" step.


> NN and DN don't check whether corrupted blocks reported by clients are 
> actually corrupted
> -
>
> Key: HDFS-7269
> URL: https://issues.apache.org/jira/browse/HDFS-7269
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>
> We had a case where the client machine had memory issue and thus failed the 
> checksum validation of a given block for all its replicas. So the client 
> ended up informing NN about the corrupted blocks for all DNs via 
> reportBadBlocks. However, the block isn't corrupted on any of the DNs. You 
> can still use DFSClient to read the block. But in order to get rid of NN's 
> warning message for corrupt block, we either do a NN fail over, or repair the 
> file via a) copy the file somewhere, b) remove the file, c) copy the file 
> back.
> It will be useful if NN and DN can validate client's report. In fact, there 
> is a comment in NamenodeRpcServer about this.
> {noformat}
>   /**
>* The client has detected an error on the specified located blocks 
>* and is reporting them to the server.  For now, the namenode will 
>* mark the block as corrupt.  In the future we might 
>* check the blocks are actually corrupt. 
>*/
> {noformat}
> To allow system to recover from invalid client report quickly, we can support 
> automatic recovery or manual admins command.
> 1. we can have NN send a new DatanodeCommand like ValidateBlockCommand. DN 
> will notify the validate result via IBR and new 
> ReceivedDeletedBlockInfo.BlockStatus.VALIDATED_BLOCK.
> 2. Some new admins command to move corrupted blocks out of BM's 
> CorruptReplicasMap and UnderReplicatedBlocks.
> Appreciate any input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10340) data node sudden killed

2016-04-28 Thread tu nguyen khac (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263439#comment-15263439
 ] 

tu nguyen khac commented on HDFS-10340:
---

i tried read all system log files , and cut off login all user who ssh to this 
server , but nothing change ( it still killed ), I changed to use ubuntu 14 and 
everything 's okie now

> data node sudden killed 
> 
>
> Key: HDFS-10340
> URL: https://issues.apache.org/jira/browse/HDFS-10340
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
> Environment: Ubuntu 16.04 LTS , RAM 16g , cpu core : 8 , hdd 100gb, 
> hadoop 2.6.0
>Reporter: tu nguyen khac
>Priority: Critical
>
> I tried to setup a new data node using ubuntu 16 
> and get it join to an existed Hadoop Hdfs cluster ( there are 10 nodes in 
> this cluster and they all run on centos Os 6 ) 
> But when i try to boostrap this node , after about 10 or 20 minutes i get 
> this strange errors : 
> 2016-04-26 20:12:09,394 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.3.24.65:55323, dest: /10.3.24.197:50010, bytes: 79902, op: HDFS_WRITE, 
> cliID: DFSClient_NONMAPREDUCE_1379996362_1, offset: 0, srvID: 
> 225f5b43-1dd3-4ac6-88d2-1e8d27dba55b, blockid: 
> BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, duration: 
> 15331628
> 2016-04-26 20:12:09,394 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder: BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, 
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2016-04-26 20:12:25,410 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038502_789829
> 2016-04-26 20:12:25,411 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832
> 2016-04-26 20:13:18,546 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Scheduling blk_1074038502_789829 file 
> /data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
>  for deletion
> 2016-04-26 20:13:18,562 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Deleted BP-352432948-10.3.24.65-1433821675295 blk_1074038502_789829 file 
> /data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
> 2016-04-26 20:15:46,481 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
> 2016-04-26 20:15:46,504 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down DataNode at bigdata-dw-24-197/10.3.24.197
> /



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-28 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263428#comment-15263428
 ] 

Mingliang Liu commented on HDFS-10335:
--

Failing tests are not related. Specially, 
{{hadoop.hdfs.TestRollingUpgradeRollback}} fails because of port in use. 
{{hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl}} is a known bug 
which is tracked by [HDFS-10260].

We did not add new test as the code path is covered by existing tests. We 
manually tested the patch and the Mover was ~60X faster than before, though 
it's not a general case as all its ARCHIVE datanodes are newly added to the 
same rack.

> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-10335.000.patch, HDFS-10335.000.patch
>
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263401#comment-15263401
 ] 

Hadoop QA commented on HDFS-10335:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 22s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_92. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 56s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 138m 20s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_92 Failed junit tests | hadoop.hdfs.TestFileAppend |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
|   | hadoop.hdfs.TestRollingUpgradeRollback |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:cf2ee45 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801344/HDFS-10335.000.patch |
| JIRA Issue |

[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263367#comment-15263367
 ] 

Hadoop QA commented on HDFS-9543:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
46s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} HDFS-1312 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
58s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s 
{color} | {color:green} HDFS-1312 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: patch generated 0 
new + 180 unchanged - 1 fixed = 180 total (was 181) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 24s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 51s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 159m 17s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed junit tests | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   | hadoop.hdfs.tools.TestDFSAdmin |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead |
|   | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.balancer.TestBalancer |
| JDK

[jira] [Updated] (HDFS-9545) DiskBalancer : Add Plan Command

2016-04-28 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9545:
---
Attachment: HDFS-9545-HDFS-1312.002.patch

This patch takes care of comments from [~eddyxu].

> DiskBalancer : Add Plan Command
> ---
>
> Key: HDFS-9545
> URL: https://issues.apache.org/jira/browse/HDFS-9545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9545-HDFS-1312.001.patch, 
> HDFS-9545-HDFS-1312.002.patch
>
>
> Allows user to create a Plan and persist it. This is useful if the users want 
> to evaluate the actions of disk balancer before running the balancing job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9545) DiskBalancer : Add Plan Command

2016-04-28 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9545:
---
Status: Patch Available  (was: Open)

> DiskBalancer : Add Plan Command
> ---
>
> Key: HDFS-9545
> URL: https://issues.apache.org/jira/browse/HDFS-9545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9545-HDFS-1312.001.patch, 
> HDFS-9545-HDFS-1312.002.patch
>
>
> Allows user to create a Plan and persist it. This is useful if the users want 
> to evaluate the actions of disk balancer before running the balancing job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10344) DistributedFileSystem#getTrashRoots should skip encryption zone that does not have .Trash

2016-04-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263313#comment-15263313
 ] 

Hadoop QA commented on HDFS-10344:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 1s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 34s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 33s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 55m 53s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 168m 8s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed junit tests

[jira] [Updated] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-28 Thread Mingliang Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10335:
-
Attachment: HDFS-10335.000.patch

Re-uploading the same patch to trigger Jenkins.

> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-10335.000.patch, HDFS-10335.000.patch
>
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9709) DiskBalancer : Add tests for disk balancer using a Mock Mover class.

2016-04-28 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9709:

Fix Version/s: HDFS-1312

> DiskBalancer : Add tests for disk balancer using a Mock Mover class.
> 
>
> Key: HDFS-9709
> URL: https://issues.apache.org/jira/browse/HDFS-9709
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: HDFS-1312
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-1312
>
> Attachments: HDFS-9709-HDFS-1312.001.patch, 
> HDFS-9709-HDFS-1312.002.patch, HDFS-9709-HDFS-1312.003.patch, 
> HDFS-9709-HDFS-1312.004.patch
>
>
> Add tests cases for DiskBalancer using a Mock Mover class. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9856) Suppress Jenkins warning for sample JSON file

2016-04-28 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9856:

Fix Version/s: HDFS-1312

> Suppress Jenkins warning for sample JSON file
> -
>
> Key: HDFS-9856
> URL: https://issues.apache.org/jira/browse/HDFS-9856
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-1312
>Reporter: Arpit Agarwal
>Assignee: Xiaobing Zhou
> Fix For: HDFS-1312
>
> Attachments: HDFS-9856-HDFS-1312.000.patch
>
>
> Jenkins runs generate a warning for the sample JSON plan as follows:
> {code}
> Lines that start with ? in the ASF License  report indicate files that do 
> not have an Apache license header:
>  !? 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/diskBalancer/data-cluster-3node-3disk.json
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9921) Stop tracking CHANGES.txt in the HDFS-1312 feature branch.

2016-04-28 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9921:

Fix Version/s: HDFS-1312

> Stop tracking CHANGES.txt in the HDFS-1312 feature branch.
> --
>
> Key: HDFS-9921
> URL: https://issues.apache.org/jira/browse/HDFS-9921
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Xiaobing Zhou
>Priority: Trivial
> Fix For: HDFS-1312
>
> Attachments: HDFS-9921-HDFS-1312.000.patch
>
>
> Remove tracking in changes.txt in HDFS-1312 since we removed it in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-28 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9543:

Fix Version/s: HDFS-1312

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-1312
>
> Attachments: HDFS-9543-HDFS-1312.001.patch, 
> HDFS-9543-HDFS-1312.002.patch, HDFS-9543-HDFS-1312.003.patch, 
> HDFS-9543-HDFS-1312.004.patch
>
>
> This patch adds the actual mover logic to the datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-28 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9543:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

committed to the feature branch. Thank you [~arpitagarwal] & [~eddyxu] for your 
reviews.

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch, 
> HDFS-9543-HDFS-1312.002.patch, HDFS-9543-HDFS-1312.003.patch, 
> HDFS-9543-HDFS-1312.004.patch
>
>
> This patch adds the actual mover logic to the datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-28 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263212#comment-15263212
 ] 

Anu Engineer commented on HDFS-9543:


[~arpitagarwal] & [~eddyxu] Thanks for the code review and comments. The test 
failures are not related to this patch. I will commit this patch shortly.

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch, 
> HDFS-9543-HDFS-1312.002.patch, HDFS-9543-HDFS-1312.003.patch, 
> HDFS-9543-HDFS-1312.004.patch
>
>
> This patch adds the actual mover logic to the datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263179#comment-15263179
 ] 

Hadoop QA commented on HDFS-9543:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
59s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} HDFS-1312 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s 
{color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s 
{color} | {color:green} HDFS-1312 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: patch generated 0 
new + 180 unchanged - 1 fixed = 180 total (was 181) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 52s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 14s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 175m 15s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed junit tests | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork |
|   | hadoop.hdfs.tools.TestDFSAdmin |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead |
|   | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.TestFileAppend |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.balancer.TestBalancer |
| JDK v1.8.0_91 Timed out junit tests |

[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-28 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263155#comment-15263155
 ] 

Arpit Agarwal commented on HDFS-9543:
-

Thanks [~anu]. +1 pending Jenkins.

The computeDelay calculation is correct as Anu explained above, I was not 
reading it correctly.

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch, 
> HDFS-9543-HDFS-1312.002.patch, HDFS-9543-HDFS-1312.003.patch, 
> HDFS-9543-HDFS-1312.004.patch
>
>
> This patch adds the actual mover logic to the datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-28 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9543:
---
Attachment: HDFS-9543-HDFS-1312.004.patch

Updating the patch based on off-line discussion with Arpit. This patch 
correctly takes care of openPoolIters and closePoolIters

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch, 
> HDFS-9543-HDFS-1312.002.patch, HDFS-9543-HDFS-1312.003.patch, 
> HDFS-9543-HDFS-1312.004.patch
>
>
> This patch adds the actual mover logic to the datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9809) Abstract implementation-specific details from the datanode

2016-04-28 Thread Virajith Jalaparti (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263105#comment-15263105
 ] 

Virajith Jalaparti commented on HDFS-9809:
--

I am attaching a new patch which includes the following changes in addition to 
the ones posted earlier. (We also ported the earlier changes to a more recent 
version of trunk). 

# Using {{ReplicaInfo.getState()}} to get the state of a {{ReplicaInfo}} 
instead of using {{instanceof}}. A related change is to use the class 
{{ReplicaInfo}} to refer to the replica objects instead of the particular 
subclass (this required adding additional abstract functions to the 
{{ReplicaInfo}} class). 
# Addition of a {{ReplicaBuilder}} and replacing calls to the constructors of 
different {{ReplicaInfo}} subclasses ({{ReplicaInPipeline}}, 
{{ReplicaBeingWritten}}, etc.) with calls to the {{ReplicaBuilder}} with the 
appropriate parameters ({{ReplicaState}}, {{blockId}} etc.) set. 
# Addition of a {{FsVolumeImplBuilder}} and replacing calls to the constructor 
of {{FsVolumeImpl}} with those to the builder. 

The idea behind the changes in (1) and (2) above is to add a new 
{{ProvidedReplica}} class (an implementation of {{ReplicaInfo}}) which can be: 
(a) used to represent replicas stored in a provided storage (described in more 
detail in the design documentation of HDFS-9806).
(b) treated as any other {{ReplicaInfo}} in the rest of the code. This would 
avoid changes to the rest of the Datanode as part of HDFS-9806. 
(c) written to using the existing replication pipeline, without implementing a 
separate write pipeline for HDFS-9806. 

The idea behind (3) is to construct the appropriate volume based on the 
{{StorageLocation}} (specified using {{dfs.datanode.data.dir}}). For example, 
as part of HDFS-9806, if a {{StorageLocation}} is of a PROVIDED type, we would 
construct a {{ProvidedVolumeImpl}}. Otherwise, a {{FsVolumeImpl}} would be 
built. 

Each location is opaque, and resolved by its volume. This allows the Datanode 
to serve data from volumes that are not local filesystems.

> Abstract implementation-specific details from the datanode
> --
>
> Key: HDFS-9809
> URL: https://issues.apache.org/jira/browse/HDFS-9809
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode, fs
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-9809.001.patch, HDFS-9809.002.patch
>
>
> Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.) 
> implicitly assume that blocks are stored in java.io.File(s) and that volumes 
> are divided into directories. We propose to abstract these details, which 
> would help in supporting other storages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9809) Abstract implementation-specific details from the datanode

2016-04-28 Thread Virajith Jalaparti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HDFS-9809:
-
Attachment: HDFS-9809.002.patch

> Abstract implementation-specific details from the datanode
> --
>
> Key: HDFS-9809
> URL: https://issues.apache.org/jira/browse/HDFS-9809
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode, fs
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-9809.001.patch, HDFS-9809.002.patch
>
>
> Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.) 
> implicitly assume that blocks are stored in java.io.File(s) and that volumes 
> are divided into directories. We propose to abstract these details, which 
> would help in supporting other storages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-9958.
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3

Committed this to trunk through branch-2.7. Thanks for working on this, 
[~kshukla] and thanks for reviews and feedback, [~walter.k.su].

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.7.3
>
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958-branch-2.001.patch, 
> HDFS-9958-branch-2.7.001.patch, HDFS-9958.001.patch, HDFS-9958.002.patch, 
> HDFS-9958.003.patch, HDFS-9958.004.patch, HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263089#comment-15263089
 ] 

Kihwal Lee commented on HDFS-9958:
--

+1 branch-2.7 patch.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958-branch-2.001.patch, 
> HDFS-9958-branch-2.7.001.patch, HDFS-9958.001.patch, HDFS-9958.002.patch, 
> HDFS-9958.003.patch, HDFS-9958.004.patch, HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10344) DistributedFileSystem#getTrashRoots should skip encryption zone that does not have .Trash

2016-04-28 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10344:
--
Affects Version/s: 2.8.0
   Status: Patch Available  (was: Open)

> DistributedFileSystem#getTrashRoots should skip encryption zone that does not 
> have .Trash
> -
>
> Key: HDFS-10344
> URL: https://issues.apache.org/jira/browse/HDFS-10344
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10344.00.patch
>
>
> HDFS-8831 added trash support for encryption zones. For encryption zone that 
> does not have .Trash created yet, DistributedFileSystem#getTrashRoots should 
> skip rather than call listStatus() and break by FileNotFoundException. I will 
> post a patch for the fix shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263077#comment-15263077
 ] 

Kihwal Lee commented on HDFS-9958:
--

+1 for the branch-2 version.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958-branch-2.001.patch, 
> HDFS-9958-branch-2.7.001.patch, HDFS-9958.001.patch, HDFS-9958.002.patch, 
> HDFS-9958.003.patch, HDFS-9958.004.patch, HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10344) DistributedFileSystem#getTrashRoots should skip encryption zone that does not have .Trash

2016-04-28 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10344:
--
Attachment: HDFS-10344.00.patch

Attach a patch with unit tests. 

> DistributedFileSystem#getTrashRoots should skip encryption zone that does not 
> have .Trash
> -
>
> Key: HDFS-10344
> URL: https://issues.apache.org/jira/browse/HDFS-10344
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10344.00.patch
>
>
> HDFS-8831 added trash support for encryption zones. For encryption zone that 
> does not have .Trash created yet, DistributedFileSystem#getTrashRoots should 
> skip rather than call listStatus() and break by FileNotFoundException. I will 
> post a patch for the fix shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system

2016-04-28 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-9806:

Attachment: HDFS-9806-design.001.pdf

Attached first draft of design doc. We've also pushed the PoC we worked on last 
year to 
[github|https://github.com/Microsoft-CISL/hadoop-prototype/tree/f1eca68d38651b426cdf867f522e9ba05ed1442e].
 It includes a tool (likely throwaway) to scan a remote namespace and output an 
FSImage. One can start a NN that will serve the contents of the external store 
through HDFS. The implementation in trunk will not track the PoC, but it is 
still useful for context.

> Allow HDFS block replicas to be provided by an external storage system
> --
>
> Key: HDFS-9806
> URL: https://issues.apache.org/jira/browse/HDFS-9806
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Chris Douglas
> Attachments: HDFS-9806-design.001.pdf
>
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-10344) DistributedFileSystem#getTrashRoots should skip encryption zone that does not have .Trash

2016-04-28 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-10344:
-

 Summary: DistributedFileSystem#getTrashRoots should skip 
encryption zone that does not have .Trash
 Key: HDFS-10344
 URL: https://issues.apache.org/jira/browse/HDFS-10344
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


HDFS-8831 added trash support for encryption zones. For encryption zone that 
does not have .Trash created yet, DistributedFileSystem#getTrashRoots should 
skip rather than call listStatus() and break by FileNotFoundException. I will 
post a patch for the fix shortly. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated HDFS-9958:
--
Attachment: HDFS-9958-branch-2.7.001.patch
HDFS-9958-branch-2.001.patch

Attaching branch-2 and branch-2.7 patches.

branch-2 patch applies clean to branch-2.8.

[~kihwal], Request for comments/review. Thanks a lot!

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958-branch-2.001.patch, 
> HDFS-9958-branch-2.7.001.patch, HDFS-9958.001.patch, HDFS-9958.002.patch, 
> HDFS-9958.003.patch, HDFS-9958.004.patch, HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-28 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9543:
---
Attachment: HDFS-9543-HDFS-1312.003.patch

bq. In copyBlocks, the openPoolIters call should be inside the while loop, just 
before you open the try block.

Just wanted to make sure that I understand this point clearly before I do any 
fixes. copyBlock takes a source volume and destination volume. openPoolIters 
open all block pools on the source volume, and then we try to fetch a block 
from those pools in a round-robin fashion.

I do agree that openPoolIters should be protected by a try -- and then 
{{finally}} should call into closePoolIters. In fact, I wrote it that way, but 
I thought the nesting of code was getting out of hand. if you are saying that I 
need to switch to having a try before openPoolIters, I will fix that in the 
next update.

Moving it down into the while loop might be inefficient. In the sense that we 
will have to close the iterators and reopen them between each block fetch.

bq. I didn't understand the new computeDelay calculation. Your original 
calculation (v1 patch) looked correct, all that remained was to subtract 
timeUsed and check for negative result.

That is exactly what I was attempting to do. Here are the what the old and new 
calculations do. In the old calculation we look at a copy, let us say a 50 MB 
block was moved, and say “50 MB moved, we are supposed to do 10 MB/s so let us 
sleep for 5 seconds”. Which as you pointed was clearly wrong since it was not 
taking care of the time spend doing the copy. The new calculation looks at “we 
copied 50 MB in 3 seconds, we are supposed to copy 50 MB in 5 seconds, so let 
us subtract 5 - 3, and now sleeps for 2 seconds”.


bq. Thanks for the pointer to DirectoryScanner#scan. The synchronization is 
needed only when making changes to the in-memory block map.  
Fixed, thanks for explaining this to me. I have removed the synchronization.

bq. We can also log the maximum error count value here for quick reference. 
Also a typo cound -> count.
Fixed both issues.

bq. I think the -blockpools flag just restricts the balancing to a subset of 
blockpools.
Fixed.


> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch, 
> HDFS-9543-HDFS-1312.002.patch, HDFS-9543-HDFS-1312.003.patch
>
>
> This patch adds the actual mover logic to the datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262709#comment-15262709
 ] 

Kuhu Shukla commented on HDFS-9958:
---

[~brahmareddy] I am apprehensive of using an ArrayList instead of the plain 
array since when I looked at an old thread for this code, it talked about how 
using lists will not be as fast. I can try and dig that out.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10324) Trash directory in an encryption zone should be pre-created with sticky bit

2016-04-28 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262700#comment-15262700
 ] 

Wei-Chiu Chuang commented on HDFS-10324:


Thanks Xiaoyu and Andrew. I'm currently per-occupied by other stuff but I 
expect to post a new patch next week.

> Trash directory in an encryption zone should be pre-created with sticky bit
> ---
>
> Key: HDFS-10324
> URL: https://issues.apache.org/jira/browse/HDFS-10324
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.8.0
> Environment: CDH5.7.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10324.001.patch, HDFS-10324.002.patch, 
> HDFS-10324.003.patch
>
>
> We encountered a bug in HDFS-8831:
> After HDFS-8831, a deleted file in an encryption zone is moved to a .Trash 
> subdirectory within the encryption zone.
> However, if this .Trash subdirectory is not created beforehand, it will be 
> created and owned by the first user who deleted a file, with permission 
> drwx--. This creates a serious bug because any other non-privileged user 
> will not be able to delete any files within the encryption zone, because they 
> do not have the permission to move directories to the trash directory.
> We should fix this bug, by pre-creating the .Trash directory with sticky bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262694#comment-15262694
 ] 

Brahma Reddy Battula commented on HDFS-9958:


I think, we can fix this simple way, where we include HDFS-10343 here only..

{code}
-final DatanodeStorageInfo[] machines = new 
DatanodeStorageInfo[numMachines];
+//final DatanodeStorageInfo[] machines = new 
DatanodeStorageInfo[numMachines];
+List machinesList= new ArrayList<>(numMachines) ;
 final byte[] blockIndices = blk.isStriped() ? new byte[numMachines] : null;
 int j = 0, i = 0;
 if (numMachines > 0) {
@@ -1048,7 +1049,9 @@ private LocatedBlock createLocatedBlock(final BlockInfo 
blk, final long pos)
 final DatanodeDescriptor d = storage.getDatanodeDescriptor();
 final boolean replicaCorrupt = corruptReplicas.isReplicaCorrupt(blk, 
d);
 if (isCorrupt || (!replicaCorrupt)) {
-  machines[j++] = storage;
+  //machines[j++] = storage;
+j++;
+machinesList.add(storage);
   // TODO this can be more efficient
   if (blockIndices != null) {
 byte index = ((BlockInfoStriped) 
blk).getStorageBlockIndex(storage);
@@ -1058,6 +1061,7 @@ private LocatedBlock createLocatedBlock(final BlockInfo 
blk, final long pos)
 }
   }
 }
+final DatanodeStorageInfo[] machines=machinesList.toArray(new 
DatanodeStorageInfo[j]);
{code}

correct me if I am wrong.. thanks..


> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at

[jira] [Updated] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated HDFS-9958:
--
Status: Open  (was: Patch Available)

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262619#comment-15262619
 ] 

Hudson commented on HDFS-9958:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9688 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9688/])
HDFS-9958. BlockManager#createLocatedBlocks can throw NPE for (kihwal: rev 
6243eabb48390fffada2418ade5adf9e0766afbe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-10343) BlockManager#createLocatedBlocks may return blocks on failed storages

2016-04-28 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla reassigned HDFS-10343:
--

Assignee: Kuhu Shukla

> BlockManager#createLocatedBlocks may return blocks on failed storages
> -
>
> Key: HDFS-10343
> URL: https://issues.apache.org/jira/browse/HDFS-10343
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Kuhu Shukla
>
> Storage state is ignored when building the machines list.  Failed storage 
> removal is not immediate so clients may be directed to bad locations.  The 
> client recovers but it's less than ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-10342) BlockManager#createLocatedBlocks should not check corrupt replicas if none are corrupt

2016-04-28 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla reassigned HDFS-10342:
--

Assignee: Kuhu Shukla

> BlockManager#createLocatedBlocks should not check corrupt replicas if none 
> are corrupt
> --
>
> Key: HDFS-10342
> URL: https://issues.apache.org/jira/browse/HDFS-10342
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.7.0
>Reporter: Daryn Sharp
>Assignee: Kuhu Shukla
>
> {{corruptReplicas#isReplicaCorrupt(block, node)}} is called for every node 
> while populating the machines array.  There's no need to invoke the method if 
> {{corruptReplicas#numCorruptReplicas(block)}} returned 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262603#comment-15262603
 ] 

Kihwal Lee commented on HDFS-9958:
--

[~kshukla] please post patches for other target branches.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262598#comment-15262598
 ] 

Kihwal Lee commented on HDFS-9958:
--

Thanks for filing HDFS-10342 and HDFS-10343, Daryn.
+1 from me too. I will commit it soon.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-2173) saveNamespace should not throw IOE when only one storage directory fails to write VERSION file

2016-04-28 Thread Andras Bokor (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262549#comment-15262549
 ] 

Andras Bokor commented on HDFS-2173:


[~tlipcon] I checked this test and currently it fails before it could throw an 
IOE or RTE. So currently the test itself is incorrect.
The problem is in that block:
{code}case WRITE_STORAGE_ONE:
  // The spy throws on exception on one particular storage directory
  doAnswer(new FaultySaveImage(true))
.when(spyStorage).writeProperties((StorageDirectory)anyObject());
  // TODO: unfortunately this fails -- should be improved.
  // See HDFS-2173.
  shouldFail = true;
  break;{code}

Here we mock {code}writeProperties{code} with one parameter. But above in 
answer method we try to get the second argument with {code}StorageDirectory sd 
= (StorageDirectory)args[1];{code}

I would fix this in a separated JIRA but I am not sure if I understand well the 
happy path of the test case.
As I understand when the writeProperties is called second time an RTE should be 
thrown.

Is it the original purpose of the test?

> saveNamespace should not throw IOE when only one storage directory fails to 
> write VERSION file
> --
>
> Key: HDFS-2173
> URL: https://issues.apache.org/jira/browse/HDFS-2173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Edit log branch (HDFS-1073), 0.23.0
>Reporter: Todd Lipcon
>
> This JIRA tracks a TODO in TestSaveNamespace. Currently, if, while writing 
> the VERSION files in the storage directories, one of the directories fails, 
> the entire operation throws IOE. This is unnecessary -- instead, just that 
> directory should be marked as failed.
> This is targeted to be fixed _after_ HDFS-1073 is merged to trunk, since it 
> does not ever dataloss, and would rarely occur in practice (the dir would 
> have to fail between writing the fsimage file and writing VERSION)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-10343) BlockManager#createLocatedBlocks may return blocks on failed storages

2016-04-28 Thread Daryn Sharp (JIRA)

Daryn Sharp created HDFS-10343:
--

 Summary: BlockManager#createLocatedBlocks may return blocks on 
failed storages
 Key: HDFS-10343
 URL: https://issues.apache.org/jira/browse/HDFS-10343
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.6.0
Reporter: Daryn Sharp


Storage state is ignored when building the machines list.  Failed storage 
removal is not immediate so clients may be directed to bad locations.  The 
client recovers but it's less than ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-10342) BlockManager#createLocatedBlocks should not check corrupt replicas if none are corrupt

2016-04-28 Thread Daryn Sharp (JIRA)

Daryn Sharp created HDFS-10342:
--

 Summary: BlockManager#createLocatedBlocks should not check corrupt 
replicas if none are corrupt
 Key: HDFS-10342
 URL: https://issues.apache.org/jira/browse/HDFS-10342
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 2.7.0
Reporter: Daryn Sharp


{{corruptReplicas#isReplicaCorrupt(block, node)}} is called for every node 
while populating the machines array.  There's no need to invoke the method if 
{{corruptReplicas#numCorruptReplicas(block)}} returned 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6489) DFS Used space is not correct computed on frequent append operations

2016-04-28 Thread Ravi Prakash (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-6489:
---
Attachment: HDFS-6489.005.patch

Here's a patch which reduces the dfsUsed space when a finalized replica is 
converted to an RBW (replica-being-written) on an append so that on 
finalization, the counts are correct. This lets {{testFrequentAppend}} pass.

I'll see if it doesn't make sense to add to dfsUsed only the delta during 
finalize.

> DFS Used space is not correct computed on frequent append operations
> 
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0, 2.7.1, 2.7.2
>Reporter: stanley shi
>Assignee: Weiwei Yang
> Attachments: HDFS-6489.001.patch, HDFS-6489.002.patch, 
> HDFS-6489.003.patch, HDFS-6489.004.patch, HDFS-6489.005.patch, HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space 
> on each block write operation. This is correct in most scenario (create new 
> file), but sometimes it will behave in-correct(append small data to a large 
> block).
> For example, I have a file with only one block(say, 60M). Then I try to 
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the 
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large 
> number of files (1000+), assume the block size is 32M (half of the default 
> value), then the dfs used will be increased 1000*32M = 32G on each append to 
> the files; but actually I only write 10K bytes; this will cause the datanode 
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock  
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received 
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: 
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, 
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sda3  16G  2.9G   13G  20% /
> tmpfs 1.9G   72K  1.9G   1% /dev/shm
> /dev/sda1  97M   32M   61M  35% /boot
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-28 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262494#comment-15262494
 ] 

Daryn Sharp commented on HDFS-9958:
---

+1 Looks goods.  Will file a followup for other improvements.

Regarding whether the client should pass the storage id or not, with the 
current design it definitely should not.  The client requests a block from the 
datanode.  The datanode serves from whatever storage it's on.  Now let's say 
the client thinks the block is on S1, but it's really on S3.  The storage 
cannot be honored.  Only the DN should report the storage because it's 
authoritative.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-28 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262383#comment-15262383
 ] 

Ravi Prakash commented on HDFS-10220:
-

Is there a need to renew the lease? That would prevent it from being checked 
for another {{LEASE_HARDLIMIT_PERIOD}}. Could we simply change {{Lease 
leaseToCheck = sortedLeases.poll();}} to {{Lease leaseToCheck = 
sortedLeases.peek();}}? And add it to {{removing}} in case {{completed}} is 
true. I'd also propose to remove the {{removing.add(id);}} from the catch 
block. [~wheat9], [~jingzhao] you made these changes in HDFS-6757 . Do you have 
any comments?

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, 
> threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-10340) data node sudden killed

2016-04-28 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-10340.
---
Resolution: Invalid

Please use the mailing list for future discussions of this nature.

> data node sudden killed 
> 
>
> Key: HDFS-10340
> URL: https://issues.apache.org/jira/browse/HDFS-10340
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
> Environment: Ubuntu 16.04 LTS , RAM 16g , cpu core : 8 , hdd 100gb, 
> hadoop 2.6.0
>Reporter: tu nguyen khac
>Priority: Critical
>
> I tried to setup a new data node using ubuntu 16 
> and get it join to an existed Hadoop Hdfs cluster ( there are 10 nodes in 
> this cluster and they all run on centos Os 6 ) 
> But when i try to boostrap this node , after about 10 or 20 minutes i get 
> this strange errors : 
> 2016-04-26 20:12:09,394 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.3.24.65:55323, dest: /10.3.24.197:50010, bytes: 79902, op: HDFS_WRITE, 
> cliID: DFSClient_NONMAPREDUCE_1379996362_1, offset: 0, srvID: 
> 225f5b43-1dd3-4ac6-88d2-1e8d27dba55b, blockid: 
> BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, duration: 
> 15331628
> 2016-04-26 20:12:09,394 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder: BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, 
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2016-04-26 20:12:25,410 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038502_789829
> 2016-04-26 20:12:25,411 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832
> 2016-04-26 20:13:18,546 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Scheduling blk_1074038502_789829 file 
> /data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
>  for deletion
> 2016-04-26 20:13:18,562 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Deleted BP-352432948-10.3.24.65-1433821675295 blk_1074038502_789829 file 
> /data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
> 2016-04-26 20:15:46,481 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
> 2016-04-26 20:15:46,504 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down DataNode at bigdata-dw-24-197/10.3.24.197
> /



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10276) Different results for exist call for file.ext/name

2016-04-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262094#comment-15262094
 ] 

Hadoop QA commented on HDFS-10276:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 55s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
2s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 59s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 59s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 8s 
{color} | {color:red} root: patch generated 2 new + 114 unchanged - 0 fixed = 
116 total (was 114) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 37s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 27s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 47s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 50s {color} 
| {color:red} hadoop-hdfs in the patch failed with

[jira] [Updated] (HDFS-6489) DFS Used space is not correct computed on frequent append operations

2016-04-28 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-6489:
--
Attachment: HDFS-6489.004.patch

Revised the patch so it can be applied to trunk. Not sure if there is better 
approach to resolve this, I attach the patch and appreciate any comments.

> DFS Used space is not correct computed on frequent append operations
> 
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0, 2.7.1, 2.7.2
>Reporter: stanley shi
>Assignee: Weiwei Yang
> Attachments: HDFS-6489.001.patch, HDFS-6489.002.patch, 
> HDFS-6489.003.patch, HDFS-6489.004.patch, HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space 
> on each block write operation. This is correct in most scenario (create new 
> file), but sometimes it will behave in-correct(append small data to a large 
> block).
> For example, I have a file with only one block(say, 60M). Then I try to 
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the 
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large 
> number of files (1000+), assume the block size is 32M (half of the default 
> value), then the dfs used will be increased 1000*32M = 32G on each append to 
> the files; but actually I only write 10K bytes; this will cause the datanode 
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock  
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received 
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: 
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, 
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sda3  16G  2.9G   13G  20% /
> tmpfs 1.9G   72K  1.9G   1% /dev/shm
> /dev/sda1  97M   32M   61M  35% /boot
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10341) Add a metric to expose the timeout number of pending replication blocks

2016-04-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262056#comment-15262056
 ] 

Hadoop QA commented on HDFS-10341:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 34s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 21s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 8s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 9s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 54s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 53s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 12s 
{color} | {color:red} root: patch generated 1 new + 321 unchanged - 0 fixed = 
322 total (was 321) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 42s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_91. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 9s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 14s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 18s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
31s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 210m 58s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
|

[jira] [Commented] (HDFS-10328) Add cache pool level replication managment

2016-04-28 Thread xupeng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261936#comment-15261936
 ] 

xupeng commented on HDFS-10328:
---

Hi [~cmccabe] [~andrew.wang]

could you please review this issue and patch , thanks

> Add cache pool level replication managment 
> ---
>
> Key: HDFS-10328
> URL: https://issues.apache.org/jira/browse/HDFS-10328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HDFS-10328.001.patch
>
>
> For now, hdfs cacheadmin can not set a replication num for cachepool. Each 
> cache directive added in the cache pool should set their own replication num 
> individually. 
> Consider this situation, we add daily hive table into cache pool "hive" .Each 
> time i should set the same replication num for every table directive in the 
> same cache pool.  
> I think we should enable setting a replication num for cachepool that every 
> cache directive in the pool can inherit replication configuration from the 
> pool. Also cache directive can override replication configuration explicitly 
> by calling "add & modify  directive -replication" command from cli.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10340) data node sudden killed

2016-04-28 Thread Walter Su (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261938#comment-15261938
 ] 

Walter Su commented on HDFS-10340:
--

I don't think it's an issue. SIGTERM comes from the outside. The signal is 
probably emitted by some script, command, or daemon process.

> data node sudden killed 
> 
>
> Key: HDFS-10340
> URL: https://issues.apache.org/jira/browse/HDFS-10340
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
> Environment: Ubuntu 16.04 LTS , RAM 16g , cpu core : 8 , hdd 100gb, 
> hadoop 2.6.0
>Reporter: tu nguyen khac
>Priority: Critical
>
> I tried to setup a new data node using ubuntu 16 
> and get it join to an existed Hadoop Hdfs cluster ( there are 10 nodes in 
> this cluster and they all run on centos Os 6 ) 
> But when i try to boostrap this node , after about 10 or 20 minutes i get 
> this strange errors : 
> 2016-04-26 20:12:09,394 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.3.24.65:55323, dest: /10.3.24.197:50010, bytes: 79902, op: HDFS_WRITE, 
> cliID: DFSClient_NONMAPREDUCE_1379996362_1, offset: 0, srvID: 
> 225f5b43-1dd3-4ac6-88d2-1e8d27dba55b, blockid: 
> BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, duration: 
> 15331628
> 2016-04-26 20:12:09,394 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder: BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, 
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2016-04-26 20:12:25,410 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038502_789829
> 2016-04-26 20:12:25,411 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832
> 2016-04-26 20:13:18,546 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Scheduling blk_1074038502_789829 file 
> /data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
>  for deletion
> 2016-04-26 20:13:18,562 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Deleted BP-352432948-10.3.24.65-1433821675295 blk_1074038502_789829 file 
> /data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
> 2016-04-26 20:15:46,481 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
> 2016-04-26 20:15:46,504 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down DataNode at bigdata-dw-24-197/10.3.24.197
> /



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6489) DFS Used space is not correct computed on frequent append operations

2016-04-28 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-6489:
--
Status: In Progress  (was: Patch Available)

> DFS Used space is not correct computed on frequent append operations
> 
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.2, 2.7.1, 2.2.0
>Reporter: stanley shi
>Assignee: Weiwei Yang
> Attachments: HDFS-6489.001.patch, HDFS-6489.002.patch, 
> HDFS-6489.003.patch, HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space 
> on each block write operation. This is correct in most scenario (create new 
> file), but sometimes it will behave in-correct(append small data to a large 
> block).
> For example, I have a file with only one block(say, 60M). Then I try to 
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the 
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large 
> number of files (1000+), assume the block size is 32M (half of the default 
> value), then the dfs used will be increased 1000*32M = 32G on each append to 
> the files; but actually I only write 10K bytes; this will cause the datanode 
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock  
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received 
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: 
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, 
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sda3  16G  2.9G   13G  20% /
> tmpfs 1.9G   72K  1.9G   1% /dev/shm
> /dev/sda1  97M   32M   61M  35% /boot
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261889#comment-15261889
 ] 

Hadoop QA commented on HDFS-10337:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 7m 31s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 
11 unchanged - 1 fixed = 12 total (was 12) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 46s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 113m 49s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_92. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 21s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 241m 16s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.hdfs.tools.offlineEditsViewer.StatisticsEditsVisitor.getStatisticsString()
  At StatisticsEditsVisitor.java:then immediately reboxed in 
org.apache.hadoop.hdfs.tools.offlineEditsViewer.StatisticsEditsVisitor.getStatisticsString()
  At StatisticsEditsVisitor.java:[line 116]

[jira] [Updated] (HDFS-10276) Different results for exist call for file.ext/name

2016-04-28 Thread Yuanbo Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-10276:
--
Status: Patch Available  (was: In Progress)

> Different results for exist call for file.ext/name
> --
>
> Key: HDFS-10276
> URL: https://issues.apache.org/jira/browse/HDFS-10276
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kevin Cox
>Assignee: Yuanbo Liu
> Attachments: HDFS-10276.001.patch, HDFS-10276.002.patch, 
> HDFS-10276.003.patch
>
>
> Given you have a file {{/file}} an existence check for the path 
> {{/file/whatever}} will give different responses for different 
> implementations of FileSystem.
> LocalFileSystem will return false while DistributedFileSystem will throw 
> {{org.apache.hadoop.security.AccessControlException: Permission denied: ..., 
> access=EXECUTE, ...}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10276) Different results for exist call for file.ext/name

2016-04-28 Thread Yuanbo Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-10276:
--
Attachment: HDFS-10276.003.patch

> Different results for exist call for file.ext/name
> --
>
> Key: HDFS-10276
> URL: https://issues.apache.org/jira/browse/HDFS-10276
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kevin Cox
>Assignee: Yuanbo Liu
> Attachments: HDFS-10276.001.patch, HDFS-10276.002.patch, 
> HDFS-10276.003.patch
>
>
> Given you have a file {{/file}} an existence check for the path 
> {{/file/whatever}} will give different responses for different 
> implementations of FileSystem.
> LocalFileSystem will return false while DistributedFileSystem will throw 
> {{org.apache.hadoop.security.AccessControlException: Permission denied: ..., 
> access=EXECUTE, ...}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10276) Different results for exist call for file.ext/name

2016-04-28 Thread Yuanbo Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-10276:
--
Status: In Progress  (was: Patch Available)

> Different results for exist call for file.ext/name
> --
>
> Key: HDFS-10276
> URL: https://issues.apache.org/jira/browse/HDFS-10276
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kevin Cox
>Assignee: Yuanbo Liu
> Attachments: HDFS-10276.001.patch, HDFS-10276.002.patch, 
> HDFS-10276.003.patch
>
>
> Given you have a file {{/file}} an existence check for the path 
> {{/file/whatever}} will give different responses for different 
> implementations of FileSystem.
> LocalFileSystem will return false while DistributedFileSystem will throw 
> {{org.apache.hadoop.security.AccessControlException: Permission denied: ..., 
> access=EXECUTE, ...}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10341) Add a metric to expose the timeout number of pending replication blocks

2016-04-28 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-10341:
-
Target Version/s: 3.0.0
  Status: Patch Available  (was: Open)

> Add a metric to expose the timeout number of pending replication blocks
> ---
>
> Key: HDFS-10341
> URL: https://issues.apache.org/jira/browse/HDFS-10341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Attachments: HDFS-10341.01.patch
>
>
> Per HDFS-6682, recording the timeout number of pending replication blocks is 
> useful to get the cluster health.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10341) Add a metric to expose the timeout number of pending replication blocks

2016-04-28 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-10341:
-
Attachment: HDFS-10341.01.patch

> Add a metric to expose the timeout number of pending replication blocks
> ---
>
> Key: HDFS-10341
> URL: https://issues.apache.org/jira/browse/HDFS-10341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Attachments: HDFS-10341.01.patch
>
>
> Per HDFS-6682, recording the timeout number of pending replication blocks is 
> useful to get the cluster health.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-10341) Add a metric to expose the timeout number of pending replication blocks

2016-04-28 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reassigned HDFS-10341:


Assignee: Akira AJISAKA

> Add a metric to expose the timeout number of pending replication blocks
> ---
>
> Key: HDFS-10341
> URL: https://issues.apache.org/jira/browse/HDFS-10341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>
> Per HDFS-6682, recording the timeout number of pending replication blocks is 
> useful to get the cluster health.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-28 Thread Elliot West (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261760#comment-15261760
 ] 

Elliot West commented on HDFS-10338:


[~liuml07], [~linyiqun] I agree with all of the comments made and thank you 
both for taking the time to look into this.

I look forward to seeing the patch.

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-10341) Add a metric to expose the timeout number of pending replication blocks

2016-04-28 Thread Akira AJISAKA (JIRA)

Akira AJISAKA created HDFS-10341:


 Summary: Add a metric to expose the timeout number of pending 
replication blocks
 Key: HDFS-10341
 URL: https://issues.apache.org/jira/browse/HDFS-10341
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Akira AJISAKA


Per HDFS-6682, recording the timeout number of pending replication blocks is 
useful to get the cluster health.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block

2016-04-28 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261709#comment-15261709
 ] 

Akira AJISAKA commented on HDFS-6682:
-

bq. recording the timeout number of pending replication blocks is useful to get 
the cluster health.
Filed HDFS-10341.

> Add a metric to expose the timestamp of the oldest under-replicated block
> -
>
> Key: HDFS-6682
> URL: https://issues.apache.org/jira/browse/HDFS-6682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>  Labels: metrics
> Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, 
> HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch
>
>
> In the following case, the data in the HDFS is lost and a client needs to put 
> the same file again.
> # A Client puts a file to HDFS
> # A DataNode crashes before replicating a block of the file to other DataNodes
> I propose a metric to expose the timestamp of the oldest 
> under-replicated/corrupt block. That way client can know what file to retain 
> for the re-try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10340) data node sudden killed

2016-04-28 Thread tu nguyen khac (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tu nguyen khac updated HDFS-10340:
--
Description: 
I tried to setup a new data node using ubuntu 16 
and get it join to an existed Hadoop Hdfs cluster ( there are 10 nodes in this 
cluster and they all run on centos Os 6 ) 
But when i try to boostrap this node , after about 10 or 20 minutes i get this 
strange errors : 

2016-04-26 20:12:09,394 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/10.3.24.65:55323, dest: /10.3.24.197:50010, bytes: 79902, op: HDFS_WRITE, 
cliID: DFSClient_NONMAPREDUCE_1379996362_1, offset: 0, srvID: 
225f5b43-1dd3-4ac6-88d2-1e8d27dba55b, blockid: 
BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, duration: 15331628
2016-04-26 20:12:09,394 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder: BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, 
type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2016-04-26 20:12:25,410 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038502_789829
2016-04-26 20:12:25,411 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832
2016-04-26 20:13:18,546 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Scheduling blk_1074038502_789829 file 
/data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
 for deletion
2016-04-26 20:13:18,562 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Deleted BP-352432948-10.3.24.65-1433821675295 blk_1074038502_789829 file 
/data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
2016-04-26 20:15:46,481 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
RECEIVED SIGNAL 15: SIGTERM
2016-04-26 20:15:46,504 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at bigdata-dw-24-197/10.3.24.197
/


  was:
I tried to setup a new data node using ubuntu 16 
and get it join to an existed Hadoop Hdfs cluster ( there are 10 nodes in this 
cluster ) 
But when i try to boostrap this node , after about 10 or 20 minutes i get this 
strange errors : 

2016-04-26 20:12:09,394 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/10.3.24.65:55323, dest: /10.3.24.197:50010, bytes: 79902, op: HDFS_WRITE, 
cliID: DFSClient_NONMAPREDUCE_1379996362_1, offset: 0, srvID: 
225f5b43-1dd3-4ac6-88d2-1e8d27dba55b, blockid: 
BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, duration: 15331628
2016-04-26 20:12:09,394 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder: BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, 
type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2016-04-26 20:12:25,410 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038502_789829
2016-04-26 20:12:25,411 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832
2016-04-26 20:13:18,546 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Scheduling blk_1074038502_789829 file 
/data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
 for deletion
2016-04-26 20:13:18,562 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Deleted BP-352432948-10.3.24.65-1433821675295 blk_1074038502_789829 file 
/data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
2016-04-26 20:15:46,481 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
RECEIVED SIGNAL 15: SIGTERM
2016-04-26 20:15:46,504 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at bigdata-dw-24-197/10.3.24.197
/



> data node sudden killed 
> 
>
> Key: HDFS-10340
> URL: https://issues.apache.org/jira/browse/HDFS-10340
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
> Environment: Ubuntu 16.04 LTS , RAM 16g , cpu core : 8 , hdd 100gb, 
> hadoop 2.6.0
>

[jira] [Created] (HDFS-10340) data node sudden killed

2016-04-28 Thread tu nguyen khac (JIRA)

tu nguyen khac created HDFS-10340:
-

 Summary: data node sudden killed 
 Key: HDFS-10340
 URL: https://issues.apache.org/jira/browse/HDFS-10340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
 Environment: Ubuntu 16.04 LTS , RAM 16g , cpu core : 8 , hdd 100gb, 
hadoop 2.6.0
Reporter: tu nguyen khac
Priority: Critical


I tried to setup a new data node using ubuntu 16 
and get it join to an existed Hadoop Hdfs cluster ( there are 10 nodes in this 
cluster ) 
But when i try to boostrap this node , after about 10 or 20 minutes i get this 
strange errors : 

2016-04-26 20:12:09,394 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/10.3.24.65:55323, dest: /10.3.24.197:50010, bytes: 79902, op: HDFS_WRITE, 
cliID: DFSClient_NONMAPREDUCE_1379996362_1, offset: 0, srvID: 
225f5b43-1dd3-4ac6-88d2-1e8d27dba55b, blockid: 
BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, duration: 15331628
2016-04-26 20:12:09,394 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder: BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, 
type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2016-04-26 20:12:25,410 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038502_789829
2016-04-26 20:12:25,411 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832
2016-04-26 20:13:18,546 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Scheduling blk_1074038502_789829 file 
/data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
 for deletion
2016-04-26 20:13:18,562 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Deleted BP-352432948-10.3.24.65-1433821675295 blk_1074038502_789829 file 
/data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
2016-04-26 20:15:46,481 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
RECEIVED SIGNAL 15: SIGTERM
2016-04-26 20:15:46,504 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at bigdata-dw-24-197/10.3.24.197
/




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

2016-04-28 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261637#comment-15261637
 ] 

Kai Zheng commented on HDFS-10285:
--

Thanks Uma.
bq. Could you elaborate a bit?
Sure, I meant, current distcp tool can specify with options whether to reserve 
or copy the original file settings like block size or not when copying a file 
from source place to destination folder. I suggested we might also consider to 
add an option to specify if to reserve or copy the storage policy property or 
not, if it sounds good to help in the mentioned scenario.

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.7.2
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

69 matches

Mail list logo