date:20160106

[jira] [Updated] (HDFS-9624) DataNode start slowly due to the initial DU command operations

2016-01-06 Thread Lin Yiqun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9624:

Attachment: HDFS-9624.001.patch

> DataNode start slowly due to the initial DU command operations
> --
>
> Key: HDFS-9624
> URL: https://issues.apache.org/jira/browse/HDFS-9624
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9624.001.patch
>
>
> It seems starting datanode so slowly when I am finishing migration of 
> datanodes and restart them.I look the dn logs:
> {code}
> 2016-01-06 16:05:08,118 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
> new volume: DS-70097061-42f8-4c33-ac27-2a6ca21e60d4
> 2016-01-06 16:05:08,118 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
> volume - /home/data/data/hadoop/dfs/data/data12/current, StorageType: DISK
> 2016-01-06 16:05:08,176 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Registered FSDatasetState MBean
> 2016-01-06 16:05:08,177 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544
> 2016-01-06 16:05:08,178 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data2/current...
> 2016-01-06 16:05:08,179 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data3/current...
> 2016-01-06 16:05:08,179 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data4/current...
> 2016-01-06 16:05:08,179 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data5/current...
> 2016-01-06 16:05:08,180 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data6/current...
> 2016-01-06 16:05:08,180 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data7/current...
> 2016-01-06 16:05:08,180 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data8/current...
> 2016-01-06 16:05:08,180 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data9/current...
> 2016-01-06 16:05:08,181 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data10/current...
> 2016-01-06 16:05:08,181 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data11/current...
> 2016-01-06 16:05:08,181 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
> /home/data/data/hadoop/dfs/data/data12/current...
> 2016-01-06 16:09:49,646 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time 
> taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on 
> /home/data/data/hadoop/dfs/data/data7/current: 281466ms
> 2016-01-06 16:09:54,235 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time 
> taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on 
> /home/data/data/hadoop/dfs/data/data9/current: 286054ms
> 2016-01-06 16:09:57,859 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time 
> taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on 
> /home/data/data/hadoop/dfs/data/data2/current: 289680ms
> 2016-01-06 16:10:00,333 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time 
> taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on 
> /home/data/data/hadoop/dfs/data/data5/current: 292153ms
> 2016-01-06 16:10:05,696 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.Fs

[jira] [Created] (HDFS-9624) DataNode start slowly due to the initial DU command operations

2016-01-06 Thread Lin Yiqun (JIRA)

Lin Yiqun created HDFS-9624:
---

 Summary: DataNode start slowly due to the initial DU command 
operations
 Key: HDFS-9624
 URL: https://issues.apache.org/jira/browse/HDFS-9624
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun


It seems starting datanode so slowly when I am finishing migration of datanodes 
and restart them.I look the dn logs:
{code}
2016-01-06 16:05:08,118 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new 
volume: DS-70097061-42f8-4c33-ac27-2a6ca21e60d4
2016-01-06 16:05:08,118 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
volume - /home/data/data/hadoop/dfs/data/data12/current, StorageType: DISK
2016-01-06 16:05:08,176 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered 
FSDatasetState MBean
2016-01-06 16:05:08,177 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544
2016-01-06 16:05:08,178 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data2/current...
2016-01-06 16:05:08,179 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data3/current...
2016-01-06 16:05:08,179 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data4/current...
2016-01-06 16:05:08,179 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data5/current...
2016-01-06 16:05:08,180 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data6/current...
2016-01-06 16:05:08,180 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data7/current...
2016-01-06 16:05:08,180 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data8/current...
2016-01-06 16:05:08,180 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data9/current...
2016-01-06 16:05:08,181 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data10/current...
2016-01-06 16:05:08,181 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data11/current...
2016-01-06 16:05:08,181 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume 
/home/data/data/hadoop/dfs/data/data12/current...
2016-01-06 16:09:49,646 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken 
to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on 
/home/data/data/hadoop/dfs/data/data7/current: 281466ms
2016-01-06 16:09:54,235 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken 
to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on 
/home/data/data/hadoop/dfs/data/data9/current: 286054ms
2016-01-06 16:09:57,859 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken 
to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on 
/home/data/data/hadoop/dfs/data/data2/current: 289680ms
2016-01-06 16:10:00,333 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken 
to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on 
/home/data/data/hadoop/dfs/data/data5/current: 292153ms
2016-01-06 16:10:05,696 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken 
to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on 
/home/data/data/hadoop/dfs/data/data8/current: 297516ms
2016-01-06 16:10:11,229 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken 
to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on 
/home/data/data/hadoop/dfs/data/data6/current: 303049ms
2016-01-06 16:10:28,075 INFO 
org.apac

[jira] [Updated] (HDFS-9608) Disk IO imbalance in HDFS with heterogeneous storages

2016-01-06 Thread Wei Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhou updated HDFS-9608:
---
Attachment: HDFS-9608.02.patch

Thanks Kai for the helpful suggestions! Modifications made to the previous 
patch accordingly. Sorry for Item 2, i just forgot to delete it. Thanks!

> Disk IO imbalance in HDFS with heterogeneous storages
> -
>
> Key: HDFS-9608
> URL: https://issues.apache.org/jira/browse/HDFS-9608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei Zhou
>Assignee: Wei Zhou
> Attachments: HDFS-9608.01.patch, HDFS-9608.02.patch
>
>
> Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes 
> in HDFS with heterogeneous storages, this leads to non-RR choosing mode for 
> certain type of storage.
> Besides, it uses a shared lock for synchronization which limits the 
> concurrency of volume choosing process. Volume choosing threads that 
> operating on different storage types should be able to run concurrently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8356) Document missing properties in hdfs-default.xml

2016-01-06 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086939#comment-15086939
 ] 

Ray Chiang commented on HDFS-8356:
--

RE: Failing unit tests

Different set than previous run and both tests using JDK 8 in my tree.

> Document missing properties in hdfs-default.xml
> ---
>
> Key: HDFS-8356
> URL: https://issues.apache.org/jira/browse/HDFS-8356
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability, test
> Attachments: HDFS-8356.001.patch, HDFS-8356.002.patch, 
> HDFS-8356.003.patch, HDFS-8356.004.patch
>
>
> The following properties are currently not defined in hdfs-default.xml. These 
> properties should either be
> A) documented in hdfs-default.xml OR
> B) listed as an exception (with comments, e.g. for internal use) in the 
> TestHdfsConfigFields unit test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9600) do not check replication if the block is under construction

2016-01-06 Thread Vinayakumar B (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086927#comment-15086927
 ] 

Vinayakumar B commented on HDFS-9600:
-

Merged to branch-2.8 as well.

> do not check replication if the block is under construction
> ---
>
> Key: HDFS-9600
> URL: https://issues.apache.org/jira/browse/HDFS-9600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
>Priority: Critical
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, 
> HDFS-9600-branch-2.patch, HDFS-9600-v1.patch, HDFS-9600-v2.patch, 
> HDFS-9600-v3.patch, HDFS-9600-v4.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old 
> GS will be considered as out of date. When changing GS, in 
> BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having 
> old GS which means we will remove all replicas because no DN has new GS until 
> the block with new GS is added to blockMaps again by 
> DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be 
> regarded as missing. The probability is low but if there are decommissioning 
> nodes the DecommissionManager.Monitor will scan all blocks belongs to 
> decommissioning nodes with a very fast speed so the probability of finding 
> missing block is very high but actually they are not missing. 
> Furthermore, after closing the appended file, in 
> FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If 
> some of nodes are decommissioning, this block with new GS will be added to 
> UnderReplicatedBlocks map so there are two blocks with same ID in this map, 
> one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in 
> QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many 
> missing blocks warning in NameNode website but there is no corrupt files...
> Therefore, I think the solution is we should not check replication if the 
> block is under construction. We only check complete blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9608) Disk IO imbalance in HDFS with heterogeneous storages

2016-01-06 Thread Wei Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhou updated HDFS-9608:
---
Description: 
Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes 
in HDFS with heterogeneous storages, this leads to non-RR choosing mode for 
certain type of storage.
Besides, it uses a shared lock for synchronization which limits the concurrency 
of volume choosing process. Volume choosing threads that operating on different 
storage types should be able to run concurrently. 

  was:Currently RoundRobinVolumeChoosingPolicy use a shared index to choose 
volumes in HDFS with heterogeneous storages, this leads to non-RR choosing mode 
for certain type of storage. 


> Disk IO imbalance in HDFS with heterogeneous storages
> -
>
> Key: HDFS-9608
> URL: https://issues.apache.org/jira/browse/HDFS-9608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei Zhou
>Assignee: Wei Zhou
> Attachments: HDFS-9608.01.patch
>
>
> Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes 
> in HDFS with heterogeneous storages, this leads to non-RR choosing mode for 
> certain type of storage.
> Besides, it uses a shared lock for synchronization which limits the 
> concurrency of volume choosing process. Volume choosing threads that 
> operating on different storage types should be able to run concurrently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9600) do not check replication if the block is under construction

2016-01-06 Thread Vinayakumar B (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086911#comment-15086911
 ] 

Vinayakumar B commented on HDFS-9600:
-

Committed to trunk, branch-2, branch-2.7 and branch-2.6

Thanks all.

> do not check replication if the block is under construction
> ---
>
> Key: HDFS-9600
> URL: https://issues.apache.org/jira/browse/HDFS-9600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
>Priority: Critical
> Fix For: 2.7.3, 2.6.4
>
> Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, 
> HDFS-9600-branch-2.patch, HDFS-9600-v1.patch, HDFS-9600-v2.patch, 
> HDFS-9600-v3.patch, HDFS-9600-v4.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old 
> GS will be considered as out of date. When changing GS, in 
> BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having 
> old GS which means we will remove all replicas because no DN has new GS until 
> the block with new GS is added to blockMaps again by 
> DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be 
> regarded as missing. The probability is low but if there are decommissioning 
> nodes the DecommissionManager.Monitor will scan all blocks belongs to 
> decommissioning nodes with a very fast speed so the probability of finding 
> missing block is very high but actually they are not missing. 
> Furthermore, after closing the appended file, in 
> FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If 
> some of nodes are decommissioning, this block with new GS will be added to 
> UnderReplicatedBlocks map so there are two blocks with same ID in this map, 
> one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in 
> QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many 
> missing blocks warning in NameNode website but there is no corrupt files...
> Therefore, I think the solution is we should not check replication if the 
> block is under construction. We only check complete blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9608) Disk IO imbalance in HDFS with heterogeneous storages

2016-01-06 Thread Wei Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhou updated HDFS-9608:
---
Description: Currently RoundRobinVolumeChoosingPolicy use a shared index to 
choose volumes in HDFS with heterogeneous storages, this leads to non-RR 
choosing mode for certain type of storage.   (was: Currently 
RoundRobinVolumeChoosingPolicy use a shared index to choose volumes in HDFS 
with heterogeneous storages, this leads to non-RR choosing mode for certain 
type of storage.)

> Disk IO imbalance in HDFS with heterogeneous storages
> -
>
> Key: HDFS-9608
> URL: https://issues.apache.org/jira/browse/HDFS-9608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei Zhou
>Assignee: Wei Zhou
> Attachments: HDFS-9608.01.patch
>
>
> Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes 
> in HDFS with heterogeneous storages, this leads to non-RR choosing mode for 
> certain type of storage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9600) do not check replication if the block is under construction

2016-01-06 Thread Vinayakumar B (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086858#comment-15086858
 ] 

Vinayakumar B commented on HDFS-9600:
-

bq. The native build fails when libwebhdfs in contrib is built. This is not the 
case if you simply do -Pnative. I think it is HDFS-8346.
Might be another reason for failure in branch-2.
But I have seen with both docker and non-docker mode in branch-2.6. It fails in 
branch-2.6 with docker mode because dev-support/DockerFile doesnot exist in 
branch-2.6.

> do not check replication if the block is under construction
> ---
>
> Key: HDFS-9600
> URL: https://issues.apache.org/jira/browse/HDFS-9600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
>Priority: Critical
> Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, 
> HDFS-9600-branch-2.patch, HDFS-9600-v1.patch, HDFS-9600-v2.patch, 
> HDFS-9600-v3.patch, HDFS-9600-v4.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old 
> GS will be considered as out of date. When changing GS, in 
> BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having 
> old GS which means we will remove all replicas because no DN has new GS until 
> the block with new GS is added to blockMaps again by 
> DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be 
> regarded as missing. The probability is low but if there are decommissioning 
> nodes the DecommissionManager.Monitor will scan all blocks belongs to 
> decommissioning nodes with a very fast speed so the probability of finding 
> missing block is very high but actually they are not missing. 
> Furthermore, after closing the appended file, in 
> FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If 
> some of nodes are decommissioning, this block with new GS will be added to 
> UnderReplicatedBlocks map so there are two blocks with same ID in this map, 
> one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in 
> QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many 
> missing blocks warning in NameNode website but there is no corrupt files...
> Therefore, I think the solution is we should not check replication if the 
> block is under construction. We only check complete blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9623) Update example configuration of block state change log in log4j.properties

2016-01-06 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-9623:
---
Attachment: HDFS-9623.001.patch

> Update example configuration of block state change log in log4j.properties
> --
>
> Key: HDFS-9623
> URL: https://issues.apache.org/jira/browse/HDFS-9623
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 2.8.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-9623.001.patch
>
>
> The log level of block state change log was changed from INFO to DEBUG by 
> HDFS-6860. The example configuration in log4j.properties should be updated 
> along with the change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9623) Update example configuration of block state change log in log4j.properties

2016-01-06 Thread Masatake Iwasaki (JIRA)

Masatake Iwasaki created HDFS-9623:
--

 Summary: Update example configuration of block state change log in 
log4j.properties
 Key: HDFS-9623
 URL: https://issues.apache.org/jira/browse/HDFS-9623
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: logging
Affects Versions: 2.8.0
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor


The log level of block state change log was changed from INFO to DEBUG by 
HDFS-6860. The example configuration in log4j.properties should be updated 
along with the change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory

2016-01-06 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086760#comment-15086760
 ] 

Kai Zheng commented on HDFS-9621:
-

Thanks for the work Jing. The patch looks great. Some minor comments:
1. For this change:
{code}
+  ecPolicy = fileNode.isStriped() ? ecPolicy : null;
{code}
How about:
{code}
ecPolicy = null;
if (fileNode.isStriped()) {
  ecPolicy = FSDirErasureCodingOp.getErasureCodingPolicy(fsd.getFSNamesystem(), 
iip);
}
{code}

2. For this codes:
{code}
+DirectoryListing listing = fs.getClient().listPaths(dir.toString(),
+new byte[0], false);
+HdfsFileStatus[] files = listing.getPartialListing();
+assertNotNull(files[0].getErasureCodingPolicy()); // ecSubDir
+assertNull(files[1].getErasureCodingPolicy()); // replicatedFile
{code}
Might be not very reliable relying on the listed entry order considering 
{{listPaths}} or {{getPartialListing}} may change in implementation.

> getListing wrongly associates Erasure Coding policy to pre-existing 
> replicated files under an EC directory  
> 
>
> Key: HDFS-9621
> URL: https://issues.apache.org/jira/browse/HDFS-9621
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-9621.000.patch
>
>
> This is reported by [~ssreenivasan]:
> If we set Erasure Coding policy to a directory which contains some files with 
> replicated blocks, later when listing files under the directory these files 
> will be reported as EC files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory

2016-01-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086707#comment-15086707
 ] 

Hadoop QA commented on HDFS-9621:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 11s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 49m 50s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 134m 42s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestDistributedFileSystem |
|   | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780865/HDFS-9621.000.patch |
| JIRA Issue | HDFS-9621 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 51ade031330c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchproce

[jira] [Commented] (HDFS-9047) Retire libwebhdfs

2016-01-06 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086683#comment-15086683
 ] 

Haohui Mai commented on HDFS-9047:
--

Looks like there is no effort on fixing anything. IMO +1 on removing them in 
2.6 / 2.7 if it's breaking the pre-commit builds, but I'll leave the decision 
to the release manager.

> Retire libwebhdfs
> -
>
> Key: HDFS-9047
> URL: https://issues.apache.org/jira/browse/HDFS-9047
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Allen Wittenauer
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9047.000.patch
>
>
> This library is basically a mess:
> * It's not part of the mvn package
> * It's missing functionality and barely maintained
> * It's not in the precommit runs so doesn't get exercised regularly
> * It's not part of the unit tests (at least, that I can see)
> * It isn't documented in any official documentation
> But most importantly:  
> * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open)
> Let's cut our losses and just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-9047) Retire libwebhdfs

2016-01-06 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086683#comment-15086683
 ] 

Haohui Mai edited comment on HDFS-9047 at 1/7/16 2:25 AM:
--

Looks like there is no effort on fixing anything. IMO +1 on removing them in 
2.6 / 2.7 if it's breaking the pre-commit builds, but I'll leave to the release 
manager to make the call.


was (Author: wheat9):
Looks like there is no effort on fixing anything. IMO +1 on removing them in 
2.6 / 2.7 if it's breaking the pre-commit builds, but I'll leave the decision 
to the release manager.

> Retire libwebhdfs
> -
>
> Key: HDFS-9047
> URL: https://issues.apache.org/jira/browse/HDFS-9047
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Allen Wittenauer
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9047.000.patch
>
>
> This library is basically a mess:
> * It's not part of the mvn package
> * It's missing functionality and barely maintained
> * It's not in the precommit runs so doesn't get exercised regularly
> * It's not part of the unit tests (at least, that I can see)
> * It isn't documented in any official documentation
> But most importantly:  
> * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open)
> Let's cut our losses and just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9607) Advance Hadoop Architecture (AHA) - HDFS

2016-01-06 Thread Dinesh S. Atreya (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086662#comment-15086662
 ] 

Dinesh S. Atreya commented on HDFS-9607:


Both [~ste...@apache.org] and [~wheat9] have raised good points. 

Can only work on the design doc during spare time. 

Don't want to have semantics of "update" too different from "append". Kindly 
indicate what are the semantics of "append" vis-a-vis above, if folks know and 
remember. (in-parallel I will dig that information up).

As a start we will reuse the [Generation Stamp | 
https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc] of 
blocks from the Append effort.



> Advance Hadoop Architecture (AHA) - HDFS
> 
>
> Key: HDFS-9607
> URL: https://issues.apache.org/jira/browse/HDFS-9607
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Dinesh S. Atreya
>
> Link to Umbrella JIRA
> https://issues.apache.org/jira/browse/HADOOP-12620 
> Provide capability to carry out in-place writes/updates. Only writes in-place 
> are supported where the existing length does not change.
> For example, "Hello World" can be replaced by "Hello HDFS!"
> See 
> https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300
>  for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9622) get Block Locations is always unstable

2016-01-06 Thread lichao liu (JIRA)

lichao liu created HDFS-9622:


 Summary: get Block Locations is always unstable
 Key: HDFS-9622
 URL: https://issues.apache.org/jira/browse/HDFS-9622
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
 Environment: CDH5.5.0
Reporter: lichao liu


query speed is slow in Impala,I am using CDH5.5.0

I monitor the backstage implala log, found time-consuming long query background 
are abnormal, as follows:

Tuple(id=0 size=40 slots=[Slot(id=0 type=STRING col_path=[4] offset=24 
null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=1 type=BIGINT 
col_path=[5] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), 
Slot(id=2 type=BIGINT col_path=[6] offset=16 null=(offset=0 mask=2) slot_idx=1 
field_idx=-1), Slot(id=3 type=STRING col_path=[0] offset=-1 null=(offset=0 
mask=1) slot_idx=0 field_idx=-1), Slot(id=4 type=STRING col_path=[1] offset=-1 
null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=5 type=STRING 
col_path=[2] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1)] 
tuple_path=[])
Tuple(id=1 size=40 slots=[Slot(id=6 type=STRING col_path=[] offset=24 
null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=7 type=BIGINT 
col_path=[] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=8 
type=BIGINT col_path=[] offset=16 null=(offset=0 mask=2) slot_idx=1 
field_idx=-1)] tuple_path=[])
Tuple(id=2 size=40 slots=[Slot(id=9 type=STRING col_path=[] offset=24 
null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=10 type=BIGINT 
col_path=[] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), 
Slot(id=11 type=BIGINT col_path=[] offset=16 null=(offset=0 mask=2) slot_idx=1 
field_idx=-1)] tuple_path=[])
I0106 09:46:59.656497 19278 plan-fragment-executor.cc:303] Open(): 
instance_id=794f58dadaa44cb8:1f24c33dda8d00a2
I0106 09:47:20.070286  6805 RetryInvocationHandler.java:144] Exception while 
invoking getBlockLocations of class ClientNamenodeProtocolTranslatorPB over 
namenode1:8020. Trying to fail over immediately.
Java exception follows:
org.apache.hadoop.net.ConnectTimeoutException: Call From datanode to 
namenode1:8020 failed on socket timeout exception: 
org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout while 
waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending namenode1:8020]; For more 
details see:  http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLojavascript:;cations(ClientNamenodeProtocolTranslatorPB.java:254)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1258)
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1245)
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1233)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:302)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:268)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:260)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1564)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:308)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:304)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout 
while waiting for channel to be read

[jira] [Updated] (HDFS-9620) Slow writer may fail permanently if pipeline breaks.

2016-01-06 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-9620:
--
Component/s: security
 hdfs-client

> Slow writer may fail permanently if pipeline breaks.
> 
>
> Key: HDFS-9620
> URL: https://issues.apache.org/jira/browse/HDFS-9620
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, security
>Reporter: Kihwal Lee
>Priority: Critical
>
> During a block write to a datanode, if the block write time exceed the block 
> token expiration, the client will not be able to reestablish a block output 
> stream. E.g. if a node in the pipeline dies, the pipeline recovery won't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order

2016-01-06 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086634#comment-15086634
 ] 

Chris Douglas commented on HDFS-8891:
-

bq. shall we cherry-pick this fix to 2.6.4 as well?

Yes, it [looks 
like|https://git1-us-west.apache.org/repos/asf?p=hadoop.git;a=blob;f=hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java;h=bd4f555110f9abdd4583041a4e7c8f0670cdc844;hb=branch-2.6#l2039]
 this is also in branch-2.6.

> HDFS concat should keep srcs order
> --
>
> Key: HDFS-8891
> URL: https://issues.apache.org/jira/browse/HDFS-8891
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yong Zhang
>Assignee: Yong Zhang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch
>
>
> FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
> order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9047) Retire libwebhdfs

2016-01-06 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086600#comment-15086600
 ] 

Junping Du commented on HDFS-9047:
--

Hi [~wheat9], what's the plan for branch-2.6/2.7? Remove it or fix it?

> Retire libwebhdfs
> -
>
> Key: HDFS-9047
> URL: https://issues.apache.org/jira/browse/HDFS-9047
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Allen Wittenauer
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9047.000.patch
>
>
> This library is basically a mess:
> * It's not part of the mvn package
> * It's missing functionality and barely maintained
> * It's not in the precommit runs so doesn't get exercised regularly
> * It's not part of the unit tests (at least, that I can see)
> * It isn't documented in any official documentation
> But most importantly:  
> * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open)
> Let's cut our losses and just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9047) Retire libwebhdfs

2016-01-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086599#comment-15086599
 ] 

Hudson commented on HDFS-9047:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #9061 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9061/])
HDFS-9047. Retire libwebhdfs. Contributed by Haohui Mai. (wheat9: rev 
c213ee085971483d737a2d4652adfda0f767eea0)
* hadoop-hdfs-project/hadoop-hdfs-native-client/src/CMakeLists.txt
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_query.c
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_ops.c
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.h
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_threaded.c
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.h
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.c
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.c
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_web.c
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_query.h
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/CMakeLists.txt
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_read.c
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_write.c
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/resources/FindJansson.cmake


> Retire libwebhdfs
> -
>
> Key: HDFS-9047
> URL: https://issues.apache.org/jira/browse/HDFS-9047
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Allen Wittenauer
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9047.000.patch
>
>
> This library is basically a mess:
> * It's not part of the mvn package
> * It's missing functionality and barely maintained
> * It's not in the precommit runs so doesn't get exercised regularly
> * It's not part of the unit tests (at least, that I can see)
> * It isn't documented in any official documentation
> But most importantly:  
> * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open)
> Let's cut our losses and just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9047) Retire libwebhdfs

2016-01-06 Thread Haohui Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9047:
-
Release Note: libwebhdfs has been retired in 2.8.0 due to the lack of 
maintenance.

> Retire libwebhdfs
> -
>
> Key: HDFS-9047
> URL: https://issues.apache.org/jira/browse/HDFS-9047
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Allen Wittenauer
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9047.000.patch
>
>
> This library is basically a mess:
> * It's not part of the mvn package
> * It's missing functionality and barely maintained
> * It's not in the precommit runs so doesn't get exercised regularly
> * It's not part of the unit tests (at least, that I can see)
> * It isn't documented in any official documentation
> But most importantly:  
> * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open)
> Let's cut our losses and just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9047) Retire libwebhdfs

2016-01-06 Thread Haohui Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9047:
-
Target Version/s:   (was: 3.0.0)

> Retire libwebhdfs
> -
>
> Key: HDFS-9047
> URL: https://issues.apache.org/jira/browse/HDFS-9047
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Allen Wittenauer
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9047.000.patch
>
>
> This library is basically a mess:
> * It's not part of the mvn package
> * It's missing functionality and barely maintained
> * It's not in the precommit runs so doesn't get exercised regularly
> * It's not part of the unit tests (at least, that I can see)
> * It isn't documented in any official documentation
> But most importantly:  
> * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open)
> Let's cut our losses and just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9047) Retire libwebhdfs

2016-01-06 Thread Haohui Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9047:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed,Incompatible change
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk, branch-2 and branch-2.8. Thanks all for the 
reviews.

> Retire libwebhdfs
> -
>
> Key: HDFS-9047
> URL: https://issues.apache.org/jira/browse/HDFS-9047
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Allen Wittenauer
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9047.000.patch
>
>
> This library is basically a mess:
> * It's not part of the mvn package
> * It's missing functionality and barely maintained
> * It's not in the precommit runs so doesn't get exercised regularly
> * It's not part of the unit tests (at least, that I can see)
> * It isn't documented in any official documentation
> But most importantly:  
> * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open)
> Let's cut our losses and just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory

2016-01-06 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9621:

Status: Patch Available  (was: Open)

> getListing wrongly associates Erasure Coding policy to pre-existing 
> replicated files under an EC directory  
> 
>
> Key: HDFS-9621
> URL: https://issues.apache.org/jira/browse/HDFS-9621
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-9621.000.patch
>
>
> This is reported by [~ssreenivasan]:
> If we set Erasure Coding policy to a directory which contains some files with 
> replicated blocks, later when listing files under the directory these files 
> will be reported as EC files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory

2016-01-06 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9621:

Attachment: HDFS-9621.000.patch

Upload a patch to fix.

> getListing wrongly associates Erasure Coding policy to pre-existing 
> replicated files under an EC directory  
> 
>
> Key: HDFS-9621
> URL: https://issues.apache.org/jira/browse/HDFS-9621
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-9621.000.patch
>
>
> This is reported by [~ssreenivasan]:
> If we set Erasure Coding policy to a directory which contains some files with 
> replicated blocks, later when listing files under the directory these files 
> will be reported as EC files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-06 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086570#comment-15086570
 ] 

Konstantin Shvachko commented on HDFS-8999:
---

> Let's test with the last block to see if it already solves the problem. I 
> hesitates to be so aggressive.

Did you test without this patch? How? May be the problem is already solved just 
with HDFS-1172, as I argued above.

> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -
>
> Key: HDFS-8999
> URL: https://issues.apache.org/jira/browse/HDFS-8999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8999_20151228.patch, h8999_20160106.patch, 
> h8999_20160106b.patch, h8999_20160106c.patch
>
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
># NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
># If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
># When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory

2016-01-06 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9621:

Issue Type: Sub-task  (was: Bug)
Parent: HDFS-8031

> getListing wrongly associates Erasure Coding policy to pre-existing 
> replicated files under an EC directory  
> 
>
> Key: HDFS-9621
> URL: https://issues.apache.org/jira/browse/HDFS-9621
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Jing Zhao
>Priority: Critical
>
> This is reported by [~ssreenivasan]:
> If we set Erasure Coding policy to a directory which contains some files with 
> replicated blocks, later when listing files under the directory these files 
> will be reported as EC files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-06 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086562#comment-15086562
 ] 

Konstantin Shvachko commented on HDFS-8999:
---

COMPLETE state used to mean that the number of reported replicas is {{>= 
minReplication}}, not {{> 1}}. Would make sense to me to retain this logic.

> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -
>
> Key: HDFS-8999
> URL: https://issues.apache.org/jira/browse/HDFS-8999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8999_20151228.patch, h8999_20160106.patch, 
> h8999_20160106b.patch, h8999_20160106c.patch
>
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
># NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
># If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
># When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory

2016-01-06 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9621:

Summary: getListing wrongly associates Erasure Coding policy to 
pre-existing replicated files under an EC directory(was: {{getListing}} 
wrongly associates Erasure Coding policy to pre-existing replicated files under 
an EC directory  )

> getListing wrongly associates Erasure Coding policy to pre-existing 
> replicated files under an EC directory  
> 
>
> Key: HDFS-9621
> URL: https://issues.apache.org/jira/browse/HDFS-9621
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Jing Zhao
>Priority: Blocker
>
> This is reported by [~ssreenivasan]:
> If we set Erasure Coding policy to a directory which contains some files with 
> replicated blocks, later when listing files under the directory these files 
> will be reported as EC files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory

2016-01-06 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9621:

Priority: Critical  (was: Blocker)

> getListing wrongly associates Erasure Coding policy to pre-existing 
> replicated files under an EC directory  
> 
>
> Key: HDFS-9621
> URL: https://issues.apache.org/jira/browse/HDFS-9621
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Jing Zhao
>Priority: Critical
>
> This is reported by [~ssreenivasan]:
> If we set Erasure Coding policy to a directory which contains some files with 
> replicated blocks, later when listing files under the directory these files 
> will be reported as EC files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9621) {{getListing}} wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory

2016-01-06 Thread Jing Zhao (JIRA)

Jing Zhao created HDFS-9621:
---

 Summary: {{getListing}} wrongly associates Erasure Coding policy 
to pre-existing replicated files under an EC directory  
 Key: HDFS-9621
 URL: https://issues.apache.org/jira/browse/HDFS-9621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Affects Versions: 3.0.0
Reporter: Sushmitha Sreenivasan
Assignee: Jing Zhao
Priority: Blocker


This is reported by [~ssreenivasan]:

If we set Erasure Coding policy to a directory which contains some files with 
replicated blocks, later when listing files under the directory these files 
will be reported as EC files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9047) Retire libwebhdfs

2016-01-06 Thread Haohui Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9047:
-
Summary: Retire libwebhdfs  (was: deprecate libwebhdfs in branch-2; remove 
from trunk)

> Retire libwebhdfs
> -
>
> Key: HDFS-9047
> URL: https://issues.apache.org/jira/browse/HDFS-9047
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Allen Wittenauer
>Assignee: Haohui Mai
> Attachments: HDFS-9047.000.patch
>
>
> This library is basically a mess:
> * It's not part of the mvn package
> * It's missing functionality and barely maintained
> * It's not in the precommit runs so doesn't get exercised regularly
> * It's not part of the unit tests (at least, that I can see)
> * It isn't documented in any official documentation
> But most importantly:  
> * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open)
> Let's cut our losses and just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9607) Advance Hadoop Architecture (AHA) - HDFS

2016-01-06 Thread Dinesh S. Atreya (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086539#comment-15086539
 ] 

Dinesh S. Atreya commented on HDFS-9607:


Copying [comment | 
https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15083784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15083784]
 from parent/umbrella JIRA to here:

{quote}
[Haohui Mai | 
https://issues.apache.org/jira/secure/ViewProfile.jspa?name=wheat9] added a 
comment - Yesterday

I agree that the capabilities can be quite powerful. The real issue how it can 
be done. There are some questions need to be answered:

(1) What is the semantic of update-in-place precisely when there are failures? 
Is it atomic and transactional? What does the consistent model look like? What 
are the semantics and durability guarantee look like? For example, what happens 
if one of the DN in the pipeline is down? What will the reader see?
(2) Once you define the semantic, is the semantic / specification meaningful 
and complete? Does it cover all the failure cases? How to evaluate and prove 
there is no corner cases?
(3) How to implement the semantic in code? What is the approach you are taking? 
Is it MVCC, distributed transaction or an ad-hoc solution tailored to HDFS?

So far we all agree that it is a useful capability. I don't think it require 
more communications to establish it enables a number new use cases.

However, I don't see this is a complete solution without addressing Steve's 
questions and all the questions above. It would be beneficial to have a design 
doc and a working prototype to clarify the confusions.

{quote}


> Advance Hadoop Architecture (AHA) - HDFS
> 
>
> Key: HDFS-9607
> URL: https://issues.apache.org/jira/browse/HDFS-9607
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Dinesh S. Atreya
>
> Link to Umbrella JIRA
> https://issues.apache.org/jira/browse/HADOOP-12620 
> Provide capability to carry out in-place writes/updates. Only writes in-place 
> are supported where the existing length does not change.
> For example, "Hello World" can be replaced by "Hello HDFS!"
> See 
> https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300
>  for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086483#comment-15086483
 ] 

Wei-Chiu Chuang commented on HDFS-9619:
---

The failed tests appear to be flaky ones, unrelated to this patch.

Meanwhile, I ran TestBalancerWithMultipleNameNodes.testBalancer locally for 
more than 600 times so far without any failures.

> DataNode sometimes can not find blockpool for the correct namenode
> --
>
> Key: HDFS-9619
> URL: https://issues.apache.org/jira/browse/HDFS-9619
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: test
> Attachments: HDFS-9619.001.patch, HDFS-9619.002.patch
>
>
> We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
> replicate a file, because a data node is excluded.
> {noformat}
> File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
> (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
> operation.
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
> {noformat}
> Relevent logs suggest root cause is due to block pool not found.  
> {noformat}
> 2016-01-03 22:11:43,174 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
> BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(280)) - 
> host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: 
> /127.0.0.1:47318 dst: /127.0.0.1:49997
> java.io.IOException: Non existent blockpool 
> BP-1927700312-172.26.2.1-145188790
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> For a bit more context, this test starts a cluster with two name nodes and 
> one data node. The block pools are added, but one of them is not found after 
> added. The root cause is due to an undetected concurrent access in a hash map 
> in SimulatedFSDataset (two block pools are added simultaneously). I added 
> some logs to print blockMap, and saw a few ConcurrentModificationExceptions. 
> The solution would be to use a thread safe class instead, like 
> ConcurrentHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2016-01-06 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-8647:
-
Target Version/s: 2.6.4

> Abstract BlockManager's rack policy into BlockPlacementPolicy
> -
>
> Key: HDFS-8647
> URL: https://issues.apache.org/jira/browse/HDFS-8647
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, 
> HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch, 
> HDFS-8647-005.patch, HDFS-8647-006.patch, HDFS-8647-007.patch, 
> HDFS-8647-008.patch, HDFS-8647-009.patch, HDFS-8647-branch26.patch, 
> HDFS-8647-branch27.patch
>
>
> Sometimes we want to have namenode use alternative block placement policy 
> such as upgrade domains in HDFS-7541.
> BlockManager has built-in assumption about rack policy in functions such as 
> useDelHint, blockHasEnoughRacks. That means when we have new block placement 
> policy, we need to modify BlockManager to account for the new policy. Ideally 
> BlockManager should ask BlockPlacementPolicy object instead. That will allow 
> us to provide new BlockPlacementPolicy without changing BlockManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9314) Improve BlockPlacementPolicyDefault's picking of excess replicas

2016-01-06 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-9314:
-
Target Version/s: 2.6.4

> Improve BlockPlacementPolicyDefault's picking of excess replicas
> 
>
> Key: HDFS-9314
> URL: https://issues.apache.org/jira/browse/HDFS-9314
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Xiao Chen
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: HDFS-9314.001.patch, HDFS-9314.002.patch, 
> HDFS-9314.003.patch, HDFS-9314.004.patch, HDFS-9314.005.patch, 
> HDFS-9314.006.patch, HDFS-9314.branch26.patch, HDFS-9314.branch27.patch
>
>
> The test case used in HDFS-9313 identified NullPointerException as well as 
> the limitation of excess replica picking. If the current replicas are on 
> {SSD(rack r1), DISK(rack 2), DISK(rack 3), DISK(rack 3)} and the storage 
> policy changes to HOT_STORAGE_POLICY_ID, BlockPlacementPolicyDefault's won't 
> be able to delete SSD replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9313) Possible NullPointerException in BlockManager if no excess replica can be chosen

2016-01-06 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-9313:
-
Target Version/s: 2.6.4

> Possible NullPointerException in BlockManager if no excess replica can be 
> chosen
> 
>
> Key: HDFS-9313
> URL: https://issues.apache.org/jira/browse/HDFS-9313
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: HDFS-9313-2.patch, HDFS-9313.branch26.patch, 
> HDFS-9313.branch27.patch, HDFS-9313.patch
>
>
> HDFS-8647 makes it easier to reason about various block placement scenarios. 
> Here is one possible case where BlockManager won't be able to find the excess 
> replica to delete: when storage policy changes around the same time balancer 
> moves the block. When this happens, it will cause NullPointerException.
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978)
> {noformat}
> Note that it isn't found in any production clusters. Instead, it is found 
> from new unit tests. In addition, the issue has been there before HDFS-8647.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086351#comment-15086351
 ] 

Hadoop QA commented on HDFS-9619:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 7s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 47s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 130m 0s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.namenode.TestNNThroughputBenchmark |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.hdfs.server.namenode.TestNNThroughputBenchmark |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780814/HDFS-9619.002.patch |
| JIRA Issue | HDFS-9619 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 54993fa1453d 3.13.0-36-lowlat

[jira] [Commented] (HDFS-9620) Slow writer may fail permanently if pipeline breaks.

2016-01-06 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086340#comment-15086340
 ] 

Kihwal Lee commented on HDFS-9620:
--

The read path already has a mechanism for refetching block tokens, but it is 
currently not possible for writers to reacquire a block token for existing 
block being written.

> Slow writer may fail permanently if pipeline breaks.
> 
>
> Key: HDFS-9620
> URL: https://issues.apache.org/jira/browse/HDFS-9620
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Critical
>
> During a block write to a datanode, if the block write time exceed the block 
> token expiration, the client will not be able to reestablish a block output 
> stream. E.g. if a node in the pipeline dies, the pipeline recovery won't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9620) Slow writer may fail permanently if pipeline breaks.

2016-01-06 Thread Kihwal Lee (JIRA)

Kihwal Lee created HDFS-9620:


 Summary: Slow writer may fail permanently if pipeline breaks.
 Key: HDFS-9620
 URL: https://issues.apache.org/jira/browse/HDFS-9620
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Priority: Critical


During a block write to a datanode, if the block write time exceed the block 
token expiration, the client will not be able to reestablish a block output 
stream. E.g. if a node in the pipeline dies, the pipeline recovery won't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2016-01-06 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086320#comment-15086320
 ] 

Elliott Clark commented on HDFS-6440:
-

+1 for branch-2 please.

> Support more than 2 NameNodes
> -
>
> Key: HDFS-6440
> URL: https://issues.apache.org/jira/browse/HDFS-6440
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover, ha, namenode
>Affects Versions: 2.4.0
>Reporter: Jesse Yates
>Assignee: Jesse Yates
> Fix For: 3.0.0
>
> Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
> hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
> hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
> hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3599) Better expose when under-construction files are preventing DN decommission

2016-01-06 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086285#comment-15086285
 ] 

Andrew Wang commented on HDFS-3599:
---

I think that same check is still in DecommissionManager#isSufficient:

{code}
  if (bc.isUnderConstruction() && block.equals(bc.getLastBlock())) {
// Can decom a UC block as long as there will still be minReplicas
if (blockManager.hasMinStorage(block, numLive)) {
  LOG.trace("UC block {} sufficiently-replicated since numLive ({}) "
  + ">= minR ({})", block, numLive,
  blockManager.getMinStorageNum(block));
  return true;
{code}

Looking at the HDFS-7411 diff, it did not change the unit test introduced by 
HDFS-5579 so I think it was carried over correctly.

The high-level point is that open files block decommission. If you try to 
decommission the 3 nodes that are writing the 3 replicas of a block, we can't 
drop below minReplication and still be able to complete the block. So, 
decommission will wait on 3-minRep of the nodes.

DecommissionManager right now has tons of debug/trace prints with these kinds 
of issues. It'd be good to expose this as a metric or something, so it can be 
easily queried by admins.

That, or we solve it once and for all by actively re-routing clients away from 
decommissioning nodes. There are a number of ideas for how we might do this.

> Better expose when under-construction files are preventing DN decommission
> --
>
> Key: HDFS-3599
> URL: https://issues.apache.org/jira/browse/HDFS-3599
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Zhe Zhang
>
> Filing on behalf of Konstantin Olchanski:
> {quote}
> I have been trying to decommission a data node, but the process
> stalled. I followed the correct instructions, observed my node
> listed in "Decommissioning Nodes", etc, observed "Under Replicated Blocks"
> decrease, etc. But the count went down to "1" and the decommissin process 
> stalled.
> There was no visible activity anywhere, nothing was happening (well,
> maybe in some hidden log file somewhere something complained,
> but I did not look).
> It turns out that I had some files stuck in "OPENFORWRITE" mode,
> as reported by "hdfs fsck / -openforwrite -files -blocks -locations -racks":
> {code}
> /users/trinat/data/.fuse_hidden177e0002 0 bytes, 0 block(s), 
> OPENFORWRITE:  OK
> /users/trinat/data/.fuse_hidden178d0003 0 bytes, 0 block(s), 
> OPENFORWRITE:  OK
> /users/trinat/data/.fuse_hidden1da30004 0 bytes, 1 block(s), 
> OPENFORWRITE:  OK
> 0. 
> BP-88378204-142.90.119.126-1340494203431:blk_6980480609696383665_20259{blockUCState=UNDER_CONSTRUCTION,
>  primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[142.90.111.72:50010|RBW], 
> ReplicaUnderConstruction[142.90.119.162:50010|RBW], 
> ReplicaUnderConstruction[142.90.119.126:50010|RBW]]} len=0 repl=3 
> [/detfac/142.90.111.72:50010, /isac2/142.90.119.162:50010, 
> /isac2/142.90.119.126:50010]
> {code}
> After I deleted those files, the decommission process completed successfully.
> Perhaps one can add some visible indication somewhere on the HDFS status web 
> page
> that the decommission process is stalled and maybe report why it is stalled?
> Maybe the number of "OPENFORWRITE" files should be listed on the status page
> next to the "Number of Under-Replicated Blocks"? (Since I know that nobody is 
> writing
> to my HDFS, the non-zero count would give me a clue that something is wrong).
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086281#comment-15086281
 ] 

Wei-Chiu Chuang commented on HDFS-9619:
---

TestBlockReplacement.testBlockReplacement is a flaky test that often fails.
TestBlockStoragePolicy.testChangeHotFileRep appears to be a flaky test too.

> DataNode sometimes can not find blockpool for the correct namenode
> --
>
> Key: HDFS-9619
> URL: https://issues.apache.org/jira/browse/HDFS-9619
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: test
> Attachments: HDFS-9619.001.patch, HDFS-9619.002.patch
>
>
> We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
> replicate a file, because a data node is excluded.
> {noformat}
> File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
> (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
> operation.
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
> {noformat}
> Relevent logs suggest root cause is due to block pool not found.  
> {noformat}
> 2016-01-03 22:11:43,174 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
> BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(280)) - 
> host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: 
> /127.0.0.1:47318 dst: /127.0.0.1:49997
> java.io.IOException: Non existent blockpool 
> BP-1927700312-172.26.2.1-145188790
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> For a bit more context, this test starts a cluster with two name nodes and 
> one data node. The block pools are added, but one of them is not found after 
> added. The root cause is due to an undetected concurrent access in a hash map 
> in SimulatedFSDataset (two block pools are added simultaneously). I added 
> some logs to print blockMap, and saw a few ConcurrentModificationExceptions. 
> The solution would be to use a thread safe class instead, like 
> ConcurrentHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086271#comment-15086271
 ] 

Hadoop QA commented on HDFS-9618:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 56s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 54m 36s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 140m 42s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 |
|   | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780805/HDFS-9618.001.patch |
| JIRA Issue | HDFS-9618 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvnin

[jira] [Commented] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086225#comment-15086225
 ] 

Hadoop QA commented on HDFS-9619:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 32s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 50m 30s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 130m 1s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.datanode.TestBlockReplacement |
|   | hadoop.hdfs.TestBlockStoragePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780801/HDFS-9619.001.patch |
| JIRA Issue | HDFS-9619 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 30744bddc127 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precomm

[jira] [Commented] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086145#comment-15086145
 ] 

Wei-Chiu Chuang commented on HDFS-9619:
---

Well, maybe the test case is not needed. It's pretty obvious there is a 
concurrency bug using HashMap without synchronized block.

> DataNode sometimes can not find blockpool for the correct namenode
> --
>
> Key: HDFS-9619
> URL: https://issues.apache.org/jira/browse/HDFS-9619
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: test
> Attachments: HDFS-9619.001.patch, HDFS-9619.002.patch
>
>
> We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
> replicate a file, because a data node is excluded.
> {noformat}
> File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
> (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
> operation.
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
> {noformat}
> Relevent logs suggest root cause is due to block pool not found.  
> {noformat}
> 2016-01-03 22:11:43,174 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
> BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(280)) - 
> host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: 
> /127.0.0.1:47318 dst: /127.0.0.1:49997
> java.io.IOException: Non existent blockpool 
> BP-1927700312-172.26.2.1-145188790
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> For a bit more context, this test starts a cluster with two name nodes and 
> one data node. The block pools are added, but one of them is not found after 
> added. The root cause is due to an undetected concurrent access in a hash map 
> in SimulatedFSDataset (two block pools are added simultaneously). I added 
> some logs to print blockMap, and saw a few ConcurrentModificationExceptions. 
> The solution would be to use a thread safe class instead, like 
> ConcurrentHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9617) my java client use muti-thread to put a same file to a same hdfs uri, after no lease error，then client OutOfMemoryError

2016-01-06 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-9617.
--
Resolution: Invalid

> my java client use muti-thread to put a same file to a same hdfs uri, after 
> no lease error，then client OutOfMemoryError
> ---
>
> Key: HDFS-9617
> URL: https://issues.apache.org/jira/browse/HDFS-9617
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zuotingbing
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /Tmp2/43.bmp.tmp (inode 2913263): File does not exist. [Lease.  
> Holder: DFSClient_NONMAPREDUCE_2084151715_1, pendingcreates: 250]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3358)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3160)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3042)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1653)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:391)
>   at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1473)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1290)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:536)
> my java client(JVM -Xmx=2G) :
> jmap TOP15：
> num #instances #bytes  class name
> --
>1: 48072 2053976792  [B
>2: 458525987568  
>3: 458525878944  
>4:  33634193112  
>5:  33632548168  
>6:  27332299008  
>7:   5332191696  [Ljava.nio.ByteBuffer;
>8: 247332026600  [C
>9: 312872002368  
> org.apache.hadoop.hdfs.DFSOutputStream$Packet
>   10: 31972 767328  java.util.LinkedList$Node
>   11: 22845 548280  java.lang.String
>   12: 20372 488928  java.util.concurrent.atomic.AtomicLong
>   13:  3700 452984  java.lang.Class
>   14:   981 439576  
>   15:  5583 376344  [S



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9617) my java client use muti-thread to put a same file to a same hdfs uri, after no lease error，then client OutOfMemoryError

2016-01-06 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086124#comment-15086124
 ] 

Kihwal Lee commented on HDFS-9617:
--

bq. my java client use muti-thread to put a same file to a same hdfs uri
Unless each thread creates a separate instance of UserGroupInformation for its 
file system, they will all look like one writer to the namenode, causing all 
sorts of problems.

> my java client use muti-thread to put a same file to a same hdfs uri, after 
> no lease error，then client OutOfMemoryError
> ---
>
> Key: HDFS-9617
> URL: https://issues.apache.org/jira/browse/HDFS-9617
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zuotingbing
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /Tmp2/43.bmp.tmp (inode 2913263): File does not exist. [Lease.  
> Holder: DFSClient_NONMAPREDUCE_2084151715_1, pendingcreates: 250]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3358)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3160)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3042)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1653)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:391)
>   at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1473)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1290)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:536)
> my java client(JVM -Xmx=2G) :
> jmap TOP15：
> num #instances #bytes  class name
> --
>1: 48072 2053976792  [B
>2: 458525987568  
>3: 458525878944  
>4:  33634193112  
>5:  33632548168  
>6:  27332299008  
>7:   5332191696  [Ljava.nio.ByteBuffer;
>8: 247332026600  [C
>9: 312872002368  
> org.apache.hadoop.hdfs.DFSOutputStream$Packet
>   10: 31972 767328  java.util.LinkedList$Node
>   11: 22845 548280  java.lang.String
>   12: 20372 488928  java.util.concurrent.atomic.AtomicLong
>   13:  3700 452984  java.lang.Class
>   14:   981 439576  
>   15:  5583 376344  [S



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9576) HTrace: collect position/length information on read operations

2016-01-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086111#comment-15086111
 ] 

Hadoop QA commented on HDFS-9576:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-hdfs-project/hadoop-hdfs-client (total was 136, now 137). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 13s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780808/HDFS-9576.04.patch |
| JIRA Issue | HDFS-9576 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 6c850e486918 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
|

[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9619:
--
Attachment: HDFS-9619.002.patch

Rev02: Added a test case. 
In this test case {{TestSimulatedFSDataset.testConcurrentAddBlockPool()}}, it 
starts two threads, which add different block pools concurrently, and then 
attempt to add a block into the pool. If the block pool is not found, it throws 
an IOException. 

Without the rev01 patch that uses ConcurrentHashMap, this test case always fail 
because it can not find an added block pool; after the patch, I am not seeing 
any failures.

> DataNode sometimes can not find blockpool for the correct namenode
> --
>
> Key: HDFS-9619
> URL: https://issues.apache.org/jira/browse/HDFS-9619
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: test
> Attachments: HDFS-9619.001.patch, HDFS-9619.002.patch
>
>
> We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
> replicate a file, because a data node is excluded.
> {noformat}
> File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
> (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
> operation.
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
> {noformat}
> Relevent logs suggest root cause is due to block pool not found.  
> {noformat}
> 2016-01-03 22:11:43,174 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
> BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(280)) - 
> host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: 
> /127.0.0.1:47318 dst: /127.0.0.1:49997
> java.io.IOException: Non existent blockpool 
> BP-1927700312-172.26.2.1-145188790
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> For a bit more context, this test starts a cluster with two name nodes and 
> one data node. The block pools are added, but one of them is not found after 
> added. The root cause is due to an undetected concurrent access in a hash map 
> in SimulatedFSDataset (two block pools are added simultaneously). I added 
> some logs to print blockMap, and saw a few ConcurrentModificationExceptions. 
> The solution would be to use a thread safe class instead, like 
> ConcurrentHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-06 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086101#comment-15086101
 ] 

Jing Zhao commented on HDFS-8999:
-

# How about making BlockNotYetCompleteException simply an IOException and then 
in {{appendFile}} wrapping it inside of a {{RetriableException}} (like the 
current {{checkNameNodeSafeMode}})? In this way we can depend on the existing 
retry logic fo {{RetriableException}} and do not need to have explicit retry in 
{{callAppend}}.
# We may need a unit test for the append retry in a block-not-yet-complete 
scenario.
# In {{commitOrCompleteLastBlock}} and {{addStoredBlock}}, looks like we do not 
need the {{hasMinStorage}} check when adding the replicas to the pending queue? 
Otherwise the block may be later put into the under-replicated queue with 
{{QUEUE_WITH_CORRUPT_BLOCKS}} priority. If this change makes sense to you, we 
may also need another unit test here.
{code}
if (hasMinStorage(lastBlock)) {
  if (b) {
addExpectedReplicasToPending(lastBlock, bc);
  }
{code}

> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -
>
> Key: HDFS-8999
> URL: https://issues.apache.org/jira/browse/HDFS-8999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8999_20151228.patch, h8999_20160106.patch, 
> h8999_20160106b.patch, h8999_20160106c.patch
>
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
># NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
># If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
># When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9498) Move code that tracks blocks with future generation stamps to BlockManagerSafeMode

2016-01-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086076#comment-15086076
 ] 

Hudson commented on HDFS-9498:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #9058 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9058/])
HDFS-9498. Move code that tracks blocks with future generation stamps to (arp: 
rev 67c9780609f707c11626f05028ddfd28f1b878f1)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogRace.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManagerSafeMode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerSafeMode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> Move code that tracks blocks with future generation stamps to 
> BlockManagerSafeMode
> --
>
> Key: HDFS-9498
> URL: https://issues.apache.org/jira/browse/HDFS-9498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HDFS-9498.000.patch, HDFS-9498.001.patch, 
> HDFS-9498.002.patch, HDFS-9498.003.patch, HDFS-9498.004.patch
>
>
> [HDFS-4015] counts and reports orphaned blocks  
> {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in 
> {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the 
> {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks 
> to this class.
> Leaving safe mode checks blocks with future GS in {{FSNamesystem}}. This code 
> can also be moved to {{BlockManagerSafeMode}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9617) my java client use muti-thread to put a same file to a same hdfs uri, after no lease error，then client OutOfMemoryError

2016-01-06 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086069#comment-15086069
 ] 

Mingliang Liu commented on HDFS-9617:
-

If this is not yet confirmed a bug or a feature request, please send email to 
[mailto:u...@hadoop.apache.org]. People there are willing to help you with your 
problems.

> my java client use muti-thread to put a same file to a same hdfs uri, after 
> no lease error，then client OutOfMemoryError
> ---
>
> Key: HDFS-9617
> URL: https://issues.apache.org/jira/browse/HDFS-9617
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zuotingbing
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /Tmp2/43.bmp.tmp (inode 2913263): File does not exist. [Lease.  
> Holder: DFSClient_NONMAPREDUCE_2084151715_1, pendingcreates: 250]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3358)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3160)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3042)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1653)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:391)
>   at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1473)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1290)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:536)
> my java client(JVM -Xmx=2G) :
> jmap TOP15：
> num #instances #bytes  class name
> --
>1: 48072 2053976792  [B
>2: 458525987568  
>3: 458525878944  
>4:  33634193112  
>5:  33632548168  
>6:  27332299008  
>7:   5332191696  [Ljava.nio.ByteBuffer;
>8: 247332026600  [C
>9: 312872002368  
> org.apache.hadoop.hdfs.DFSOutputStream$Packet
>   10: 31972 767328  java.util.LinkedList$Node
>   11: 22845 548280  java.lang.String
>   12: 20372 488928  java.util.concurrent.atomic.AtomicLong
>   13:  3700 452984  java.lang.Class
>   14:   981 439576  
>   15:  5583 376344  [S



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Mingliang Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9618:

Component/s: namenode

> Fix mismatch between log level and guard in 
> BlockManager#computeRecoveryWorkForBlocks
> -
>
> Key: HDFS-9618
> URL: https://issues.apache.org/jira/browse/HDFS-9618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-9618.001.patch
>
>
> Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086063#comment-15086063
 ] 

Mingliang Liu commented on HDFS-9618:
-

+1 (non-binding)

> Fix mismatch between log level and guard in 
> BlockManager#computeRecoveryWorkForBlocks
> -
>
> Key: HDFS-9618
> URL: https://issues.apache.org/jira/browse/HDFS-9618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-9618.001.patch
>
>
> Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9498) Move code that tracks blocks with future generation stamps to BlockManagerSafeMode

2016-01-06 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086055#comment-15086055
 ] 

Mingliang Liu commented on HDFS-9498:
-

Thank you [~arpitagarwal] for your insightful comments and commit. Thanks to 
[~anu] for his original effort of tracking blocks with future GS, and for code 
review.

> Move code that tracks blocks with future generation stamps to 
> BlockManagerSafeMode
> --
>
> Key: HDFS-9498
> URL: https://issues.apache.org/jira/browse/HDFS-9498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HDFS-9498.000.patch, HDFS-9498.001.patch, 
> HDFS-9498.002.patch, HDFS-9498.003.patch, HDFS-9498.004.patch
>
>
> [HDFS-4015] counts and reports orphaned blocks  
> {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in 
> {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the 
> {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks 
> to this class.
> Leaving safe mode checks blocks with future GS in {{FSNamesystem}}. This code 
> can also be moved to {{BlockManagerSafeMode}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086050#comment-15086050
 ] 

Mingliang Liu commented on HDFS-9618:
-

We have 5 levels of priority queues and the aggregation should be fast. Leaving 
it as it-is is may be better though. 

> Fix mismatch between log level and guard in 
> BlockManager#computeRecoveryWorkForBlocks
> -
>
> Key: HDFS-9618
> URL: https://issues.apache.org/jira/browse/HDFS-9618
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-9618.001.patch
>
>
> Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9498) Move code that tracks blocks with future generation stamps to BlockManagerSafeMode

2016-01-06 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9498:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2 for 2.9.0. branch-2.8 conflicts looked 
non-trivial so I skipped including it for 2.8.0.

Thanks for the contribution [~liuml07] and for the review [~anu].

> Move code that tracks blocks with future generation stamps to 
> BlockManagerSafeMode
> --
>
> Key: HDFS-9498
> URL: https://issues.apache.org/jira/browse/HDFS-9498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HDFS-9498.000.patch, HDFS-9498.001.patch, 
> HDFS-9498.002.patch, HDFS-9498.003.patch, HDFS-9498.004.patch
>
>
> [HDFS-4015] counts and reports orphaned blocks  
> {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in 
> {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the 
> {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks 
> to this class.
> Leaving safe mode checks blocks with future GS in {{FSNamesystem}}. This code 
> can also be moved to {{BlockManagerSafeMode}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086043#comment-15086043
 ] 

Mingliang Liu commented on HDFS-9618:
-

Thanks for working on this, [~iwasakims]i].

Calling {{neededReplications.size()}} or {{pendingReplications.size()}} seems 
to have low overhead so the guard can be removed, as [~drankye] proposed. The 
code that logs which blocks have been scheduled for replication should be kept 
as we iterate all recovery work and its targets.

> Fix mismatch between log level and guard in 
> BlockManager#computeRecoveryWorkForBlocks
> -
>
> Key: HDFS-9618
> URL: https://issues.apache.org/jira/browse/HDFS-9618
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-9618.001.patch
>
>
> Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9576) HTrace: collect position/length information on read operations

2016-01-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086033#comment-15086033
 ] 

Hadoop QA commented on HDFS-9576:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-hdfs-project/hadoop-hdfs-client (total was 136, now 137). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 12s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780803/HDFS-9576.03.patch |
| JIRA Issue | HDFS-9576 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 68830c147581 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Bu

[jira] [Updated] (HDFS-9576) HTrace: collect position/length information on read operations

2016-01-06 Thread Zhe Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9576:

Attachment: HDFS-9576.04.patch

Thanks Xiao for the good catch! Updating the patch to address. Also renaming 
{{readScope}} to {{scope}} to be consistent with other places using temporary 
scope variables.

> HTrace: collect position/length information on read operations
> --
>
> Key: HDFS-9576
> URL: https://issues.apache.org/jira/browse/HDFS-9576
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, tracing
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, 
> HDFS-9576.02.patch, HDFS-9576.03.patch, HDFS-9576.04.patch
>
>
> HTrace currently collects the path of each read operation (both stateful and 
> position reads). To better understand applications' I/O behavior, it is also 
> useful to track the position and length of read operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-9618:
---
Affects Version/s: (was: 3.0.0)
   2.8.0
   Status: Patch Available  (was: Open)

> Fix mismatch between log level and guard in 
> BlockManager#computeRecoveryWorkForBlocks
> -
>
> Key: HDFS-9618
> URL: https://issues.apache.org/jira/browse/HDFS-9618
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-9618.001.patch
>
>
> Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-9618:
---
Attachment: HDFS-9618.001.patch

Thanks for the comment, [~drankye]. I attached 001.

bq. Then is there any reason for the following block?

The reason seems to be that {{UnderReplicatedBlocks#size}} is not just a 
accessor but it do some aggregation. I left the part as is in the attached 
patch.


> Fix mismatch between log level and guard in 
> BlockManager#computeRecoveryWorkForBlocks
> -
>
> Key: HDFS-9618
> URL: https://issues.apache.org/jira/browse/HDFS-9618
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-9618.001.patch
>
>
> Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9576) HTrace: collect position/length information on read operations

2016-01-06 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086016#comment-15086016
 ] 

Xiao Chen commented on HDFS-9576:
-

Thanks for the work, and sorry for jumping in.
{code}
scope.addKVAnnotation("requiredLength", Integer.toString(reqLen));
{code}
Should the key be "requestedLength"?

> HTrace: collect position/length information on read operations
> --
>
> Key: HDFS-9576
> URL: https://issues.apache.org/jira/browse/HDFS-9576
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, tracing
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, 
> HDFS-9576.02.patch, HDFS-9576.03.patch
>
>
> HTrace currently collects the path of each read operation (both stateful and 
> position reads). To better understand applications' I/O behavior, it is also 
> useful to track the position and length of read operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9498) Move code that tracks blocks with future generation stamps to BlockManagerSafeMode

2016-01-06 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9498:

Summary: Move code that tracks blocks with future generation stamps to 
BlockManagerSafeMode  (was: Move code that tracks orphan blocks to 
BlockManagerSafeMode)

> Move code that tracks blocks with future generation stamps to 
> BlockManagerSafeMode
> --
>
> Key: HDFS-9498
> URL: https://issues.apache.org/jira/browse/HDFS-9498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9498.000.patch, HDFS-9498.001.patch, 
> HDFS-9498.002.patch, HDFS-9498.003.patch, HDFS-9498.004.patch
>
>
> [HDFS-4015] counts and reports orphaned blocks  
> {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in 
> {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the 
> {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks 
> to this class.
> Leaving safe mode checks blocks with future GS in {{FSNamesystem}}. This code 
> can also be moved to {{BlockManagerSafeMode}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Masatake Iwasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085978#comment-15085978
 ] 

Masatake Iwasaki commented on HDFS-9618:


This was wrong. The log level was changed by HDFS-6860.

> Fix mismatch between log level and guard in 
> BlockManager#computeRecoveryWorkForBlocks
> -
>
> Key: HDFS-9618
> URL: https://issues.apache.org/jira/browse/HDFS-9618
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>
> Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9576) HTrace: collect path/offset/length information on read operations

2016-01-06 Thread Zhe Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9576:

Description: HTrace currently collects the path of each read operation 
(both stateful and position reads). To better understand applications' I/O 
behavior, it is also useful to track the position and length of read operations.

> HTrace: collect path/offset/length information on read operations
> -
>
> Key: HDFS-9576
> URL: https://issues.apache.org/jira/browse/HDFS-9576
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, tracing
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, 
> HDFS-9576.02.patch, HDFS-9576.03.patch
>
>
> HTrace currently collects the path of each read operation (both stateful and 
> position reads). To better understand applications' I/O behavior, it is also 
> useful to track the position and length of read operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9576) HTrace: collect position/length information on read operations

2016-01-06 Thread Zhe Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9576:

Summary: HTrace: collect position/length information on read operations  
(was: HTrace: collect path/offset/length information on read operations)

> HTrace: collect position/length information on read operations
> --
>
> Key: HDFS-9576
> URL: https://issues.apache.org/jira/browse/HDFS-9576
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, tracing
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, 
> HDFS-9576.02.patch, HDFS-9576.03.patch
>
>
> HTrace currently collects the path of each read operation (both stateful and 
> position reads). To better understand applications' I/O behavior, it is also 
> useful to track the position and length of read operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9576) HTrace: collect path/offset/length information on read operations

2016-01-06 Thread Zhe Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9576:

Attachment: HDFS-9576.03.patch

Good catch! Updating the patch to address.

> HTrace: collect path/offset/length information on read operations
> -
>
> Key: HDFS-9576
> URL: https://issues.apache.org/jira/browse/HDFS-9576
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, tracing
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, 
> HDFS-9576.02.patch, HDFS-9576.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9576) HTrace: collect path/offset/length information on read operations

2016-01-06 Thread Zhe Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9576:

Summary: HTrace: collect path/offset/length information on read operations  
(was: HTrace: collect path/offset/length information on read and write 
operations)

> HTrace: collect path/offset/length information on read operations
> -
>
> Key: HDFS-9576
> URL: https://issues.apache.org/jira/browse/HDFS-9576
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, tracing
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, 
> HDFS-9576.02.patch, HDFS-9576.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9619:
--
Component/s: test
 datanode

> DataNode sometimes can not find blockpool for the correct namenode
> --
>
> Key: HDFS-9619
> URL: https://issues.apache.org/jira/browse/HDFS-9619
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: test
> Attachments: HDFS-9619.001.patch
>
>
> We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
> replicate a file, because a data node is excluded.
> {noformat}
> File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
> (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
> operation.
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
> {noformat}
> Relevent logs suggest root cause is due to block pool not found.  
> {noformat}
> 2016-01-03 22:11:43,174 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
> BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(280)) - 
> host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: 
> /127.0.0.1:47318 dst: /127.0.0.1:49997
> java.io.IOException: Non existent blockpool 
> BP-1927700312-172.26.2.1-145188790
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> For a bit more context, this test starts a cluster with two name nodes and 
> one data node. The block pools are added, but one of them is not found after 
> added. The root cause is due to an undetected concurrent access in a hash map 
> in SimulatedFSDataset (two block pools are added simultaneously). I added 
> some logs to print blockMap, and saw a few ConcurrentModificationExceptions. 
> The solution would be to use a thread safe class instead, like 
> ConcurrentHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9619:
--
Labels: test  (was: )

> DataNode sometimes can not find blockpool for the correct namenode
> --
>
> Key: HDFS-9619
> URL: https://issues.apache.org/jira/browse/HDFS-9619
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: test
> Attachments: HDFS-9619.001.patch
>
>
> We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
> replicate a file, because a data node is excluded.
> {noformat}
> File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
> (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
> operation.
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
> {noformat}
> Relevent logs suggest root cause is due to block pool not found.  
> {noformat}
> 2016-01-03 22:11:43,174 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
> BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(280)) - 
> host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: 
> /127.0.0.1:47318 dst: /127.0.0.1:49997
> java.io.IOException: Non existent blockpool 
> BP-1927700312-172.26.2.1-145188790
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> For a bit more context, this test starts a cluster with two name nodes and 
> one data node. The block pools are added, but one of them is not found after 
> added. The root cause is due to an undetected concurrent access in a hash map 
> in SimulatedFSDataset (two block pools are added simultaneously). I added 
> some logs to print blockMap, and saw a few ConcurrentModificationExceptions. 
> The solution would be to use a thread safe class instead, like 
> ConcurrentHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9619:
--
Status: Patch Available  (was: Open)

> DataNode sometimes can not find blockpool for the correct namenode
> --
>
> Key: HDFS-9619
> URL: https://issues.apache.org/jira/browse/HDFS-9619
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9619.001.patch
>
>
> We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
> replicate a file, because a data node is excluded.
> {noformat}
> File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
> (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
> operation.
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
> {noformat}
> Relevent logs suggest root cause is due to block pool not found.  
> {noformat}
> 2016-01-03 22:11:43,174 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
> BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(280)) - 
> host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: 
> /127.0.0.1:47318 dst: /127.0.0.1:49997
> java.io.IOException: Non existent blockpool 
> BP-1927700312-172.26.2.1-145188790
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> For a bit more context, this test starts a cluster with two name nodes and 
> one data node. The block pools are added, but one of them is not found after 
> added. The root cause is due to an undetected concurrent access in a hash map 
> in SimulatedFSDataset (two block pools are added simultaneously). I added 
> some logs to print blockMap, and saw a few ConcurrentModificationExceptions. 
> The solution would be to use a thread safe class instead, like 
> ConcurrentHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9619:
--
Description: 
We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
replicate a file, because a data node is excluded.

{noformat}
File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
(=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
operation.
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
{noformat}

Relevent logs suggest root cause is due to block pool not found.  
{noformat}
2016-01-03 22:11:43,174 [DataXceiver for client 
DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(280)) - host0.foo.com:49997:DataXceiver 
error processing WRITE_BLOCK operation src: /127.0.0.1:47318 dst: 
/127.0.0.1:49997
java.io.IOException: Non existent blockpool 
BP-1927700312-172.26.2.1-145188790
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
at java.lang.Thread.run(Thread.java:745)
{noformat}

For a bit more context, this test starts a cluster with two name nodes and one 
data node. The block pools are added, but one of them is not found after added. 
The root cause is due to an undetected concurrent access in a hash map in 
SimulatedFSDataset (two block pools are added simultaneously). I added some 
logs to print blockMap, and saw a few ConcurrentModificationExceptions. The 
solution would be to use a thread safe class instead, like ConcurrentHashMap.

  was:
We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
replicate a file, because a data node is excluded.

{noformat}
File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
(=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
operation.
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:230

[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9619:
--
Attachment: HDFS-9619.001.patch

Rev01. Use ConcurrentHashMap instead of HashMap in SimulatedFSDataset to store 
block pools. 

Tested locally. Before the patch, the test failed 1 in 10 runs. After the 
patch, I've been running for ~100 runs without seeing any failures.

> DataNode sometimes can not find blockpool for the correct namenode
> --
>
> Key: HDFS-9619
> URL: https://issues.apache.org/jira/browse/HDFS-9619
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9619.001.patch
>
>
> We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
> replicate a file, because a data node is excluded.
> {noformat}
> File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
> (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
> operation.
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
> {noformat}
> Relevent logs suggest root cause is due to block pool not found.  
> {noformat}
> 2016-01-03 22:11:43,174 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
> BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(280)) - 
> host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: 
> /127.0.0.1:47318 dst: /127.0.0.1:49997
> java.io.IOException: Non existent blockpool 
> BP-1927700312-172.26.2.1-145188790
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
> at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> For a bit more context, this test starts a cluster with two name nodes and 
> one data node. The block pools are added, but one of them is not found after 
> added. The root cause is due to an undetected concurrent access in a hash map 
> in SimulatedFSDataset (two block pools are added simultaneously). The 
> solution would be to use a thread safe class instead, like ConcurrentHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9615) Fix variable name typo in DFSConfigKeys

2016-01-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085950#comment-15085950
 ] 

Hudson commented on HDFS-9615:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #9057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9057/])
HDFS-9615. Fix variable name typo in DFSConfigKeys. (Contributed by Ray (arp: 
rev b9936689c9ea37bf0050e7970643bcddfc9cfdbe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java


> Fix variable name typo in DFSConfigKeys
> ---
>
> Key: HDFS-9615
> URL: https://issues.apache.org/jira/browse/HDFS-9615
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HDFS-9615.001.patch
>
>
> Ran across this typo in the variable name:
> DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDONW_DEFAULT
> should clearly be
> DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDOWN_DEFAULT
> i.e. the "N" and the "W" are swapped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9619:
--
Description: 
We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
replicate a file, because a data node is excluded.

{noformat}
File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
(=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
operation.
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
{noformat}

Relevent logs suggest root cause is due to block pool not found.  
{noformat}
2016-01-03 22:11:43,174 [DataXceiver for client 
DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(280)) - host0.foo.com:49997:DataXceiver 
error processing WRITE_BLOCK operation src: /127.0.0.1:47318 dst: 
/127.0.0.1:49997
java.io.IOException: Non existent blockpool 
BP-1927700312-172.26.2.1-145188790
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
at java.lang.Thread.run(Thread.java:745)
{noformat}

For a bit more context, this test starts a cluster with two name nodes and one 
data node. The block pools are added, but one of them is not found after added. 
The root cause is due to an undetected concurrent access in a hash map in 
SimulatedFSDataset (two block pools are added simultaneously). The solution 
would be to use a thread safe class instead, like ConcurrentHashMap.

  was:
We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
replicate a file, because a data node is excluded.

{noformat}
File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
(=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
operation.
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.a

[jira] [Commented] (HDFS-9615) Fix variable name typo in DFSConfigKeys

2016-01-06 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085943#comment-15085943
 ] 

Ray Chiang commented on HDFS-9615:
--

Thanks for the review and the commit!

> Fix variable name typo in DFSConfigKeys
> ---
>
> Key: HDFS-9615
> URL: https://issues.apache.org/jira/browse/HDFS-9615
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HDFS-9615.001.patch
>
>
> Ran across this typo in the variable name:
> DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDONW_DEFAULT
> should clearly be
> DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDOWN_DEFAULT
> i.e. the "N" and the "W" are swapped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9576) HTrace: collect path/offset/length information on read and write operations

2016-01-06 Thread Masatake Iwasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085932#comment-15085932
 ] 

Masatake Iwasaki commented on HDFS-9576:


I agree to fix tracing of write in follow-up.

The 02 patch looks good but 1 nit. The name of variable should not be 
{{ignored}} because it is used now.

> HTrace: collect path/offset/length information on read and write operations
> ---
>
> Key: HDFS-9576
> URL: https://issues.apache.org/jira/browse/HDFS-9576
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, tracing
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, 
> HDFS-9576.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9619:
--
Description: 
We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to 
replicate a file, because a data node is excluded.

{noformat}
File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
(=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
operation.
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
{noformat}

Relevent logs suggest root cause is due to block pool not found.  
{noformat}
2016-01-03 22:11:43,174 [DataXceiver for client 
DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(280)) - host0.foo.com:49997:DataXceiver 
error processing WRITE_BLOCK operation src: /127.0.0.1:47318 dst: 
/127.0.0.1:49997
java.io.IOException: Non existent blockpool 
BP-1927700312-172.26.2.1-145188790
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
at java.lang.Thread.run(Thread.java:745)
{noformat}

For a bit more context, this test starts a cluster with two name nodes and one 
data node. The block pools are added, but one of them is not found after added. 
The root cause is due to an undetected concurrent access in a hash map in 
SimulatedFSDataset. The solution would be to use a thread safe class instead, 
like ConcurrentHashMap.

  was:
We sometimes see TestBalancerWithMultipleNameNodes.testBalancer failed to 
replicate a file, because a data node is excluded.

{noformat}
File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
(=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
operation.
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apa

[jira] [Commented] (HDFS-9498) Move code that tracks orphan blocks to BlockManagerSafeMode

2016-01-06 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085933#comment-15085933
 ] 

Anu Engineer commented on HDFS-9498:


+1 (non-binding), LGTM

> Move code that tracks orphan blocks to BlockManagerSafeMode
> ---
>
> Key: HDFS-9498
> URL: https://issues.apache.org/jira/browse/HDFS-9498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9498.000.patch, HDFS-9498.001.patch, 
> HDFS-9498.002.patch, HDFS-9498.003.patch, HDFS-9498.004.patch
>
>
> [HDFS-4015] counts and reports orphaned blocks  
> {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in 
> {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the 
> {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks 
> to this class.
> Leaving safe mode checks blocks with future GS in {{FSNamesystem}}. This code 
> can also be moved to {{BlockManagerSafeMode}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)

Wei-Chiu Chuang created HDFS-9619:
-

 Summary: DataNode sometimes can not find blockpool for the correct 
namenode
 Key: HDFS-9619
 URL: https://issues.apache.org/jira/browse/HDFS-9619
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: Jenkins
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


We sometimes see TestBalancerWithMultipleNameNodes.testBalancer failed to 
replicate a file, because a data node is excluded.

{noformat}
File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
(=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
operation.
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
{noformat}

Relevent logs suggest root cause is due to block pool not found.  
{noformat}
2016-01-03 22:11:43,174 [DataXceiver for client 
DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(280)) - host0.foo.com:49997:DataXceiver 
error processing WRITE_BLOCK operation src: /127.0.0.1:47318 dst: 
/127.0.0.1:49997
java.io.IOException: Non existent blockpool 
BP-1927700312-172.26.2.1-145188790
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
at java.lang.Thread.run(Thread.java:745)
{noformat}

For a bit more context, this test starts a cluster with two name nodes and one 
data node. The block pools are added, but one of them is not found after added. 
The root cause is due to an undetected concurrent access in a hash map in 
SimulatedFSDataset. The solution would be to use a thread safe class instead, 
like ConcurrentHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9615) Fix variable name typo in DFSConfigKeys

2016-01-06 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9615:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

I've committed this to trunk. 

Since HDFS-6353 which introduced this setting is not in branch-2 no commit to 
branch-2 is required. Thanks for the contribution [~rchiang].

> Fix variable name typo in DFSConfigKeys
> ---
>
> Key: HDFS-9615
> URL: https://issues.apache.org/jira/browse/HDFS-9615
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HDFS-9615.001.patch
>
>
> Ran across this typo in the variable name:
> DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDONW_DEFAULT
> should clearly be
> DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDOWN_DEFAULT
> i.e. the "N" and the "W" are swapped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9615) Fix variable name typo in DFSConfigKeys

2016-01-06 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9615:

Summary: Fix variable name typo in DFSConfigKeys  (was: Fix variable name 
typo in 
DFSConfigKeys#DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDONW_DEFAULT)

> Fix variable name typo in DFSConfigKeys
> ---
>
> Key: HDFS-9615
> URL: https://issues.apache.org/jira/browse/HDFS-9615
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Trivial
> Attachments: HDFS-9615.001.patch
>
>
> Ran across this typo in the variable name:
> DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDONW_DEFAULT
> should clearly be
> DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDOWN_DEFAULT
> i.e. the "N" and the "W" are swapped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2016-01-06 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085887#comment-15085887
 ] 

Daryn Sharp commented on HDFS-9276:
---

I'll be taking a look to ensure this doesn't break our IP-failover HA.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>

[jira] [Commented] (HDFS-6142) StandbyException wrapped to InvalidToken exception

2016-01-06 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085867#comment-15085867
 ] 

Kihwal Lee commented on HDFS-6142:
--

bq. For example, in datanode's DataXceiver.copyBlock, it will call 
checkAccess...
That's block token, not delegation token.

> StandbyException wrapped to InvalidToken exception
> --
>
> Key: HDFS-6142
> URL: https://issues.apache.org/jira/browse/HDFS-6142
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.2.0
>Reporter: Ding Yuan
>
> The following code in 
> org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java:
> {noformat}
>   public byte[] retrievePassword(
>   DelegationTokenIdentifier identifier) throws InvalidToken {
> try {
>   // this check introduces inconsistency in the authentication to a
>   // HA standby NN.  non-token auths are allowed into the namespace which
>   // decides whether to throw a StandbyException.  tokens are a bit
>   // different in that a standby may be behind and thus not yet know
>   // of all tokens issued by the active NN.  the following check does
>   // not allow ANY token auth, however it should allow known tokens in
>   namesystem.checkOperation(OperationCategory.READ);
> } catch (StandbyException se) {
>   // FIXME: this is a hack to get around changing method signatures by
>   // tunneling a non-InvalidToken exception as the cause which the
>   // RPC server will unwrap before returning to the client
>   InvalidToken wrappedStandby = new InvalidToken("StandbyException");
>   wrappedStandby.initCause(se);
>   throw wrappedStandby;
> }
> return super.retrievePassword(identifier);
>   }
> {noformat}
> A StandbyException from namesystem.checkOperation is wrapped to InvalidToken 
> exception. The comment suggests that the RPC server will unwrap it to 
> StandbyException before sending back to the client, but this may not be the 
> case for every code path. For example, in datanode's DataXceiver.copyBlock, 
> it will call checkAccess which eventually might call retrievePassword, but 
> when copyBlock catches an InvalidToken exception, it would simply send to the 
> client that exception without unwrapping it. 
> I am not exactly sure about the possible consequence, but it seems client 
> treats StandbyException (which is perhaps much more serious) very different 
> from InvalidToken exception. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085715#comment-15085715
 ] 

Hadoop QA commented on HDFS-8999:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 34s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 37s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-hdfs-project (total was 633, now 632). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 33s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 16s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 174m 43s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapsho

[jira] [Commented] (HDFS-9279) Decomissioned capacity should not be considered for configured/used capacity

2016-01-06 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085577#comment-15085577
 ] 

Kihwal Lee commented on HDFS-9279:
--

bq. Because the data present in the decommissioning nodes would eventually be 
transferred over to the live nodes. Is this understanding correct?
The replicas are not invalidated on decommissioning nodes even after 
replicating, so the capacity tracking was not accurate either. It ended up 
double counting the used space toward the end, at which the process seems to 
stall more frequently nowadays (this is another topic). If a significant 
portion of a cluster is decommissioned, the stat will look very strange and 
confuse people.  That actually happened to us multiple times.  The free/total 
ratio will look considerably smaller than the actual value. Monitoring tools 
cannot easily dismiss it as 'Nah.. it's a temporary discrepancy caused by 
decommissioning.'

With this change, the storage capacity stat has become more like regular 
under-replication scenario caused by node/disk outages. Additional space will 
be used for re-replicating those blocks, but it is not yet allocated to those 
blocks. That's the actual state of used/usable storage and the stat reflects 
that now.  If we want the stat to reflect what would be used in the future, we 
are talking space reservation feature.


> Decomissioned capacity should not be considered for configured/used capacity
> 
>
> Key: HDFS-9279
> URL: https://issues.apache.org/jira/browse/HDFS-9279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9279-v1.patch, HDFS-9279-v2.patch, 
> HDFS-9279-v3.patch, HDFS-9279-v4.patch
>
>
> Capacity of a decommissioned node is being accounted as configured and used 
> capacity metrics. This gives incorrect perception of cluster usage.
> Once a node is decommissioned, its capacity should be considered similar to a 
> dead node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9617) my java client use muti-thread to put a same file to a same hdfs uri, after no lease error，then client OutOfMemoryError

2016-01-06 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085526#comment-15085526
 ] 

Kai Zheng commented on HDFS-9617:
-

Thanks for reporting this.
bq. my java client use muti-thread to put a same file to a same hdfs uri
I'm a little confused. How you did this or what's your code like? Would you 
elaborate this a little bit? It may help to understand why it happened. Thanks.

> my java client use muti-thread to put a same file to a same hdfs uri, after 
> no lease error，then client OutOfMemoryError
> ---
>
> Key: HDFS-9617
> URL: https://issues.apache.org/jira/browse/HDFS-9617
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zuotingbing
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /Tmp2/43.bmp.tmp (inode 2913263): File does not exist. [Lease.  
> Holder: DFSClient_NONMAPREDUCE_2084151715_1, pendingcreates: 250]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3358)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3160)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3042)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1653)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:391)
>   at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1473)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1290)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:536)
> my java client(JVM -Xmx=2G) :
> jmap TOP15：
> num #instances #bytes  class name
> --
>1: 48072 2053976792  [B
>2: 458525987568  
>3: 458525878944  
>4:  33634193112  
>5:  33632548168  
>6:  27332299008  
>7:   5332191696  [Ljava.nio.ByteBuffer;
>8: 247332026600  [C
>9: 312872002368  
> org.apache.hadoop.hdfs.DFSOutputStream$Packet
>   10: 31972 767328  java.util.LinkedList$Node
>   11: 22845 548280  java.lang.String
>   12: 20372 488928  java.util.concurrent.atomic.AtomicLong
>   13:  3700 452984  java.lang.Class
>   14:   981 439576  
>   15:  5583 376344  [S



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9612) DistCp worker threads are not terminated after jobs are done.

2016-01-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085507#comment-15085507
 ] 

Hadoop QA commented on HDFS-9612:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 57s 
{color} | {color:red} hadoop-tools_hadoop-distcp-jdk1.8.0_66 with JDK v1.8.0_66 
generated 1 new issues (was 51, now 51). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 53s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 50s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 26s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780744/HDFS-9612.005.patch |
| JIRA Issue | HDFS-9612 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b36a01e84ad5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ma

[jira] [Updated] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-06 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8999:
--
Attachment: h8999_20160106c.patch

h8999_20160106c.patch: fixes a NPE.

> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -
>
> Key: HDFS-8999
> URL: https://issues.apache.org/jira/browse/HDFS-8999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8999_20151228.patch, h8999_20160106.patch, 
> h8999_20160106b.patch, h8999_20160106c.patch
>
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
># NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
># If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
># When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9612) DistCp worker threads are not terminated after jobs are done.

2016-01-06 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9612:
--
Attachment: HDFS-9612.005.patch

Rev05: added @throws to make Javadoc happy.

> DistCp worker threads are not terminated after jobs are done.
> -
>
> Key: HDFS-9612
> URL: https://issues.apache.org/jira/browse/HDFS-9612
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.8.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9612.001.patch, HDFS-9612.002.patch, 
> HDFS-9612.003.patch, HDFS-9612.004.patch, HDFS-9612.005.patch
>
>
> In HADOOP-11827, a producer-consumer style thread pool was introduced to 
> parallelize the task of listing files/directories.
> We have a use case where a distcp job is run during the commit phase of a MR2 
> job. However, it was found distcp does not terminate ProducerConsumer thread 
> pools properly. Because threads are not terminated, those MR2 jobs never 
> finish.
> In a more typical use case where distcp is run as a standalone job, those 
> threads are terminated forcefully when the java process is terminated. So 
> these leaked threads did not become a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085482#comment-15085482
 ] 

Hadoop QA commented on HDFS-8999:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 38s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 38s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-hdfs-project (total was 633, now 632). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 44s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 4s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 151m 15s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestModTime |
|   | hadoop.hdfs.server.namenode.TestFSEditLogLoader |
|   | hadoop.hdfs.server.namenode.TestFileContextAcl |
|   | hadoop.hdfs.TestErasureCodingPolicies |
|   | hado

[jira] [Commented] (HDFS-9279) Decomissioned capacity should not be considered for configured/used capacity

2016-01-06 Thread Rajat Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085445#comment-15085445
 ] 

Rajat Jain commented on HDFS-9279:
--

While it makes sense to not include decommissioning nodes in configured 
capacity, but they should still be used for calculating used capacity. Because 
the data present in the decommissioning nodes would eventually be transferred 
over to the live nodes. Is this understanding correct?

> Decomissioned capacity should not be considered for configured/used capacity
> 
>
> Key: HDFS-9279
> URL: https://issues.apache.org/jira/browse/HDFS-9279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9279-v1.patch, HDFS-9279-v2.patch, 
> HDFS-9279-v3.patch, HDFS-9279-v4.patch
>
>
> Capacity of a decommissioned node is being accounted as configured and used 
> capacity metrics. This gives incorrect perception of cluster usage.
> Once a node is decommissioned, its capacity should be considered similar to a 
> dead node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085371#comment-15085371
 ] 

Kai Zheng commented on HDFS-9618:
-

Good catch! The pattern to use {{logger.isInfoEnabled}} shouldn't be used for 
no reason. I guess the case in question uses the condition 
{{blockLog.isInfoEnabled()}} to decide to compose and write the log message or 
not for performance consideration. Then is there any reason for the following 
block? Better to change it by the way in the fix.
{code}
if (blockLog.isDebugEnabled()) {
  blockLog.debug("BLOCK* neededReplications = {} pendingReplications = {}",
  neededReplications.size(), pendingReplications.size());
}
{code}

> Fix mismatch between log level and guard in 
> BlockManager#computeRecoveryWorkForBlocks
> -
>
> Key: HDFS-9618
> URL: https://issues.apache.org/jira/browse/HDFS-9618
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>
> Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Masatake Iwasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085335#comment-15085335
 ] 

Masatake Iwasaki commented on HDFS-9618:


The log level had been info but it seemed to be changed to debug in EC branch 
(6b6a63bb).

> Fix mismatch between log level and guard in 
> BlockManager#computeRecoveryWorkForBlocks
> -
>
> Key: HDFS-9618
> URL: https://issues.apache.org/jira/browse/HDFS-9618
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>
> Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Masatake Iwasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085324#comment-15085324
 ] 

Masatake Iwasaki commented on HDFS-9618:


{code}
if (blockLog.isInfoEnabled()) {
  // log which blocks have been scheduled for replication
  for(BlockRecoveryWork rw : recovWork){
DatanodeStorageInfo[] targets = rw.getTargets();
if (targets != null && targets.length != 0) {
  StringBuilder targetList = new StringBuilder("datanode(s)");
  for (DatanodeStorageInfo target : targets) {
targetList.append(' ');
targetList.append(target.getDatanodeDescriptor());
  }
  blockLog.debug("BLOCK* ask {} to replicate {} to {}", 
rw.getSrcNodes(),
  rw.getBlock(), targetList);
}
  }
}
{code}


> Fix mismatch between log level and guard in 
> BlockManager#computeRecoveryWorkForBlocks
> -
>
> Key: HDFS-9618
> URL: https://issues.apache.org/jira/browse/HDFS-9618
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>
> Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks

2016-01-06 Thread Masatake Iwasaki (JIRA)

Masatake Iwasaki created HDFS-9618:
--

 Summary: Fix mismatch between log level and guard in 
BlockManager#computeRecoveryWorkForBlocks
 Key: HDFS-9618
 URL: https://issues.apache.org/jira/browse/HDFS-9618
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor


Debug log message is constructed when {{Logger#isInfoEnabled}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-06 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8999:
--
Attachment: h8999_20160106b.patch

h8999_20160106b.patch: addresses Jing's comment.

> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -
>
> Key: HDFS-8999
> URL: https://issues.apache.org/jira/browse/HDFS-8999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8999_20151228.patch, h8999_20160106.patch, 
> h8999_20160106b.patch
>
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
># NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
># If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
># When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 105 matches

Mail list logo