[jira] [Updated] (HDFS-9624) DataNode start slowly due to the initial DU command operations
[ https://issues.apache.org/jira/browse/HDFS-9624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated HDFS-9624: Attachment: HDFS-9624.001.patch > DataNode start slowly due to the initial DU command operations > -- > > Key: HDFS-9624 > URL: https://issues.apache.org/jira/browse/HDFS-9624 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9624.001.patch > > > It seems starting datanode so slowly when I am finishing migration of > datanodes and restart them.I look the dn logs: > {code} > 2016-01-06 16:05:08,118 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > new volume: DS-70097061-42f8-4c33-ac27-2a6ca21e60d4 > 2016-01-06 16:05:08,118 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > volume - /home/data/data/hadoop/dfs/data/data12/current, StorageType: DISK > 2016-01-06 16:05:08,176 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Registered FSDatasetState MBean > 2016-01-06 16:05:08,177 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 > 2016-01-06 16:05:08,178 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data2/current... > 2016-01-06 16:05:08,179 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data3/current... > 2016-01-06 16:05:08,179 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data4/current... > 2016-01-06 16:05:08,179 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data5/current... > 2016-01-06 16:05:08,180 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data6/current... > 2016-01-06 16:05:08,180 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data7/current... > 2016-01-06 16:05:08,180 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data8/current... > 2016-01-06 16:05:08,180 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data9/current... > 2016-01-06 16:05:08,181 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data10/current... > 2016-01-06 16:05:08,181 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data11/current... > 2016-01-06 16:05:08,181 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume > /home/data/data/hadoop/dfs/data/data12/current... > 2016-01-06 16:09:49,646 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time > taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on > /home/data/data/hadoop/dfs/data/data7/current: 281466ms > 2016-01-06 16:09:54,235 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time > taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on > /home/data/data/hadoop/dfs/data/data9/current: 286054ms > 2016-01-06 16:09:57,859 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time > taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on > /home/data/data/hadoop/dfs/data/data2/current: 289680ms > 2016-01-06 16:10:00,333 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time > taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on > /home/data/data/hadoop/dfs/data/data5/current: 292153ms > 2016-01-06 16:10:05,696 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.Fs
[jira] [Created] (HDFS-9624) DataNode start slowly due to the initial DU command operations
Lin Yiqun created HDFS-9624: --- Summary: DataNode start slowly due to the initial DU command operations Key: HDFS-9624 URL: https://issues.apache.org/jira/browse/HDFS-9624 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun It seems starting datanode so slowly when I am finishing migration of datanodes and restart them.I look the dn logs: {code} 2016-01-06 16:05:08,118 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-70097061-42f8-4c33-ac27-2a6ca21e60d4 2016-01-06 16:05:08,118 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /home/data/data/hadoop/dfs/data/data12/current, StorageType: DISK 2016-01-06 16:05:08,176 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean 2016-01-06 16:05:08,177 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-1942012336-xx.xx.xx.xx-1406726500544 2016-01-06 16:05:08,178 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data2/current... 2016-01-06 16:05:08,179 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data3/current... 2016-01-06 16:05:08,179 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data4/current... 2016-01-06 16:05:08,179 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data5/current... 2016-01-06 16:05:08,180 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data6/current... 2016-01-06 16:05:08,180 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data7/current... 2016-01-06 16:05:08,180 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data8/current... 2016-01-06 16:05:08,180 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data9/current... 2016-01-06 16:05:08,181 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data10/current... 2016-01-06 16:05:08,181 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data11/current... 2016-01-06 16:05:08,181 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on volume /home/data/data/hadoop/dfs/data/data12/current... 2016-01-06 16:09:49,646 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on /home/data/data/hadoop/dfs/data/data7/current: 281466ms 2016-01-06 16:09:54,235 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on /home/data/data/hadoop/dfs/data/data9/current: 286054ms 2016-01-06 16:09:57,859 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on /home/data/data/hadoop/dfs/data/data2/current: 289680ms 2016-01-06 16:10:00,333 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on /home/data/data/hadoop/dfs/data/data5/current: 292153ms 2016-01-06 16:10:05,696 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on /home/data/data/hadoop/dfs/data/data8/current: 297516ms 2016-01-06 16:10:11,229 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-1942012336-xx.xx.xx.xx-1406726500544 on /home/data/data/hadoop/dfs/data/data6/current: 303049ms 2016-01-06 16:10:28,075 INFO org.apac
[jira] [Updated] (HDFS-9608) Disk IO imbalance in HDFS with heterogeneous storages
[ https://issues.apache.org/jira/browse/HDFS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zhou updated HDFS-9608: --- Attachment: HDFS-9608.02.patch Thanks Kai for the helpful suggestions! Modifications made to the previous patch accordingly. Sorry for Item 2, i just forgot to delete it. Thanks! > Disk IO imbalance in HDFS with heterogeneous storages > - > > Key: HDFS-9608 > URL: https://issues.apache.org/jira/browse/HDFS-9608 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei Zhou >Assignee: Wei Zhou > Attachments: HDFS-9608.01.patch, HDFS-9608.02.patch > > > Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes > in HDFS with heterogeneous storages, this leads to non-RR choosing mode for > certain type of storage. > Besides, it uses a shared lock for synchronization which limits the > concurrency of volume choosing process. Volume choosing threads that > operating on different storage types should be able to run concurrently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8356) Document missing properties in hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086939#comment-15086939 ] Ray Chiang commented on HDFS-8356: -- RE: Failing unit tests Different set than previous run and both tests using JDK 8 in my tree. > Document missing properties in hdfs-default.xml > --- > > Key: HDFS-8356 > URL: https://issues.apache.org/jira/browse/HDFS-8356 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.7.0 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability, test > Attachments: HDFS-8356.001.patch, HDFS-8356.002.patch, > HDFS-8356.003.patch, HDFS-8356.004.patch > > > The following properties are currently not defined in hdfs-default.xml. These > properties should either be > A) documented in hdfs-default.xml OR > B) listed as an exception (with comments, e.g. for internal use) in the > TestHdfsConfigFields unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9600) do not check replication if the block is under construction
[ https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086927#comment-15086927 ] Vinayakumar B commented on HDFS-9600: - Merged to branch-2.8 as well. > do not check replication if the block is under construction > --- > > Key: HDFS-9600 > URL: https://issues.apache.org/jira/browse/HDFS-9600 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Phil Yang >Assignee: Phil Yang >Priority: Critical > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, > HDFS-9600-branch-2.patch, HDFS-9600-v1.patch, HDFS-9600-v2.patch, > HDFS-9600-v3.patch, HDFS-9600-v4.patch > > > When appending a file, we will update pipeline to bump a new GS and the old > GS will be considered as out of date. When changing GS, in > BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having > old GS which means we will remove all replicas because no DN has new GS until > the block with new GS is added to blockMaps again by > DatanodeProtocol.blockReceivedAndDeleted. > If we check replication of this block before it is added back, it will be > regarded as missing. The probability is low but if there are decommissioning > nodes the DecommissionManager.Monitor will scan all blocks belongs to > decommissioning nodes with a very fast speed so the probability of finding > missing block is very high but actually they are not missing. > Furthermore, after closing the appended file, in > FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If > some of nodes are decommissioning, this block with new GS will be added to > UnderReplicatedBlocks map so there are two blocks with same ID in this map, > one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in > QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many > missing blocks warning in NameNode website but there is no corrupt files... > Therefore, I think the solution is we should not check replication if the > block is under construction. We only check complete blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9608) Disk IO imbalance in HDFS with heterogeneous storages
[ https://issues.apache.org/jira/browse/HDFS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zhou updated HDFS-9608: --- Description: Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes in HDFS with heterogeneous storages, this leads to non-RR choosing mode for certain type of storage. Besides, it uses a shared lock for synchronization which limits the concurrency of volume choosing process. Volume choosing threads that operating on different storage types should be able to run concurrently. was:Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes in HDFS with heterogeneous storages, this leads to non-RR choosing mode for certain type of storage. > Disk IO imbalance in HDFS with heterogeneous storages > - > > Key: HDFS-9608 > URL: https://issues.apache.org/jira/browse/HDFS-9608 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei Zhou >Assignee: Wei Zhou > Attachments: HDFS-9608.01.patch > > > Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes > in HDFS with heterogeneous storages, this leads to non-RR choosing mode for > certain type of storage. > Besides, it uses a shared lock for synchronization which limits the > concurrency of volume choosing process. Volume choosing threads that > operating on different storage types should be able to run concurrently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9600) do not check replication if the block is under construction
[ https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086911#comment-15086911 ] Vinayakumar B commented on HDFS-9600: - Committed to trunk, branch-2, branch-2.7 and branch-2.6 Thanks all. > do not check replication if the block is under construction > --- > > Key: HDFS-9600 > URL: https://issues.apache.org/jira/browse/HDFS-9600 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Phil Yang >Assignee: Phil Yang >Priority: Critical > Fix For: 2.7.3, 2.6.4 > > Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, > HDFS-9600-branch-2.patch, HDFS-9600-v1.patch, HDFS-9600-v2.patch, > HDFS-9600-v3.patch, HDFS-9600-v4.patch > > > When appending a file, we will update pipeline to bump a new GS and the old > GS will be considered as out of date. When changing GS, in > BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having > old GS which means we will remove all replicas because no DN has new GS until > the block with new GS is added to blockMaps again by > DatanodeProtocol.blockReceivedAndDeleted. > If we check replication of this block before it is added back, it will be > regarded as missing. The probability is low but if there are decommissioning > nodes the DecommissionManager.Monitor will scan all blocks belongs to > decommissioning nodes with a very fast speed so the probability of finding > missing block is very high but actually they are not missing. > Furthermore, after closing the appended file, in > FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If > some of nodes are decommissioning, this block with new GS will be added to > UnderReplicatedBlocks map so there are two blocks with same ID in this map, > one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in > QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many > missing blocks warning in NameNode website but there is no corrupt files... > Therefore, I think the solution is we should not check replication if the > block is under construction. We only check complete blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9608) Disk IO imbalance in HDFS with heterogeneous storages
[ https://issues.apache.org/jira/browse/HDFS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zhou updated HDFS-9608: --- Description: Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes in HDFS with heterogeneous storages, this leads to non-RR choosing mode for certain type of storage. (was: Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes in HDFS with heterogeneous storages, this leads to non-RR choosing mode for certain type of storage.) > Disk IO imbalance in HDFS with heterogeneous storages > - > > Key: HDFS-9608 > URL: https://issues.apache.org/jira/browse/HDFS-9608 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei Zhou >Assignee: Wei Zhou > Attachments: HDFS-9608.01.patch > > > Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes > in HDFS with heterogeneous storages, this leads to non-RR choosing mode for > certain type of storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9600) do not check replication if the block is under construction
[ https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086858#comment-15086858 ] Vinayakumar B commented on HDFS-9600: - bq. The native build fails when libwebhdfs in contrib is built. This is not the case if you simply do -Pnative. I think it is HDFS-8346. Might be another reason for failure in branch-2. But I have seen with both docker and non-docker mode in branch-2.6. It fails in branch-2.6 with docker mode because dev-support/DockerFile doesnot exist in branch-2.6. > do not check replication if the block is under construction > --- > > Key: HDFS-9600 > URL: https://issues.apache.org/jira/browse/HDFS-9600 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Phil Yang >Assignee: Phil Yang >Priority: Critical > Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, > HDFS-9600-branch-2.patch, HDFS-9600-v1.patch, HDFS-9600-v2.patch, > HDFS-9600-v3.patch, HDFS-9600-v4.patch > > > When appending a file, we will update pipeline to bump a new GS and the old > GS will be considered as out of date. When changing GS, in > BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having > old GS which means we will remove all replicas because no DN has new GS until > the block with new GS is added to blockMaps again by > DatanodeProtocol.blockReceivedAndDeleted. > If we check replication of this block before it is added back, it will be > regarded as missing. The probability is low but if there are decommissioning > nodes the DecommissionManager.Monitor will scan all blocks belongs to > decommissioning nodes with a very fast speed so the probability of finding > missing block is very high but actually they are not missing. > Furthermore, after closing the appended file, in > FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If > some of nodes are decommissioning, this block with new GS will be added to > UnderReplicatedBlocks map so there are two blocks with same ID in this map, > one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in > QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many > missing blocks warning in NameNode website but there is no corrupt files... > Therefore, I think the solution is we should not check replication if the > block is under construction. We only check complete blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9623) Update example configuration of block state change log in log4j.properties
[ https://issues.apache.org/jira/browse/HDFS-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-9623: --- Attachment: HDFS-9623.001.patch > Update example configuration of block state change log in log4j.properties > -- > > Key: HDFS-9623 > URL: https://issues.apache.org/jira/browse/HDFS-9623 > Project: Hadoop HDFS > Issue Type: Bug > Components: logging >Affects Versions: 2.8.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-9623.001.patch > > > The log level of block state change log was changed from INFO to DEBUG by > HDFS-6860. The example configuration in log4j.properties should be updated > along with the change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9623) Update example configuration of block state change log in log4j.properties
Masatake Iwasaki created HDFS-9623: -- Summary: Update example configuration of block state change log in log4j.properties Key: HDFS-9623 URL: https://issues.apache.org/jira/browse/HDFS-9623 Project: Hadoop HDFS Issue Type: Bug Components: logging Affects Versions: 2.8.0 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor The log level of block state change log was changed from INFO to DEBUG by HDFS-6860. The example configuration in log4j.properties should be updated along with the change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory
[ https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086760#comment-15086760 ] Kai Zheng commented on HDFS-9621: - Thanks for the work Jing. The patch looks great. Some minor comments: 1. For this change: {code} + ecPolicy = fileNode.isStriped() ? ecPolicy : null; {code} How about: {code} ecPolicy = null; if (fileNode.isStriped()) { ecPolicy = FSDirErasureCodingOp.getErasureCodingPolicy(fsd.getFSNamesystem(), iip); } {code} 2. For this codes: {code} +DirectoryListing listing = fs.getClient().listPaths(dir.toString(), +new byte[0], false); +HdfsFileStatus[] files = listing.getPartialListing(); +assertNotNull(files[0].getErasureCodingPolicy()); // ecSubDir +assertNull(files[1].getErasureCodingPolicy()); // replicatedFile {code} Might be not very reliable relying on the listed entry order considering {{listPaths}} or {{getPartialListing}} may change in implementation. > getListing wrongly associates Erasure Coding policy to pre-existing > replicated files under an EC directory > > > Key: HDFS-9621 > URL: https://issues.apache.org/jira/browse/HDFS-9621 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Sushmitha Sreenivasan >Assignee: Jing Zhao >Priority: Critical > Attachments: HDFS-9621.000.patch > > > This is reported by [~ssreenivasan]: > If we set Erasure Coding policy to a directory which contains some files with > replicated blocks, later when listing files under the directory these files > will be reported as EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory
[ https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086707#comment-15086707 ] Hadoop QA commented on HDFS-9621: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 11s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 49m 50s {color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 134m 42s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12780865/HDFS-9621.000.patch | | JIRA Issue | HDFS-9621 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 51ade031330c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchproce
[jira] [Commented] (HDFS-9047) Retire libwebhdfs
[ https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086683#comment-15086683 ] Haohui Mai commented on HDFS-9047: -- Looks like there is no effort on fixing anything. IMO +1 on removing them in 2.6 / 2.7 if it's breaking the pre-commit builds, but I'll leave the decision to the release manager. > Retire libwebhdfs > - > > Key: HDFS-9047 > URL: https://issues.apache.org/jira/browse/HDFS-9047 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Allen Wittenauer >Assignee: Haohui Mai > Fix For: 2.8.0 > > Attachments: HDFS-9047.000.patch > > > This library is basically a mess: > * It's not part of the mvn package > * It's missing functionality and barely maintained > * It's not in the precommit runs so doesn't get exercised regularly > * It's not part of the unit tests (at least, that I can see) > * It isn't documented in any official documentation > But most importantly: > * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open) > Let's cut our losses and just remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-9047) Retire libwebhdfs
[ https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086683#comment-15086683 ] Haohui Mai edited comment on HDFS-9047 at 1/7/16 2:25 AM: -- Looks like there is no effort on fixing anything. IMO +1 on removing them in 2.6 / 2.7 if it's breaking the pre-commit builds, but I'll leave to the release manager to make the call. was (Author: wheat9): Looks like there is no effort on fixing anything. IMO +1 on removing them in 2.6 / 2.7 if it's breaking the pre-commit builds, but I'll leave the decision to the release manager. > Retire libwebhdfs > - > > Key: HDFS-9047 > URL: https://issues.apache.org/jira/browse/HDFS-9047 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Allen Wittenauer >Assignee: Haohui Mai > Fix For: 2.8.0 > > Attachments: HDFS-9047.000.patch > > > This library is basically a mess: > * It's not part of the mvn package > * It's missing functionality and barely maintained > * It's not in the precommit runs so doesn't get exercised regularly > * It's not part of the unit tests (at least, that I can see) > * It isn't documented in any official documentation > But most importantly: > * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open) > Let's cut our losses and just remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9607) Advance Hadoop Architecture (AHA) - HDFS
[ https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086662#comment-15086662 ] Dinesh S. Atreya commented on HDFS-9607: Both [~ste...@apache.org] and [~wheat9] have raised good points. Can only work on the design doc during spare time. Don't want to have semantics of "update" too different from "append". Kindly indicate what are the semantics of "append" vis-a-vis above, if folks know and remember. (in-parallel I will dig that information up). As a start we will reuse the [Generation Stamp | https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc] of blocks from the Append effort. > Advance Hadoop Architecture (AHA) - HDFS > > > Key: HDFS-9607 > URL: https://issues.apache.org/jira/browse/HDFS-9607 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Dinesh S. Atreya > > Link to Umbrella JIRA > https://issues.apache.org/jira/browse/HADOOP-12620 > Provide capability to carry out in-place writes/updates. Only writes in-place > are supported where the existing length does not change. > For example, "Hello World" can be replaced by "Hello HDFS!" > See > https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300 > for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9622) get Block Locations is always unstable
lichao liu created HDFS-9622: Summary: get Block Locations is always unstable Key: HDFS-9622 URL: https://issues.apache.org/jira/browse/HDFS-9622 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Environment: CDH5.5.0 Reporter: lichao liu query speed is slow in Impala,I am using CDH5.5.0 I monitor the backstage implala log, found time-consuming long query background are abnormal, as follows: Tuple(id=0 size=40 slots=[Slot(id=0 type=STRING col_path=[4] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=1 type=BIGINT col_path=[5] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=2 type=BIGINT col_path=[6] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1), Slot(id=3 type=STRING col_path=[0] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=4 type=STRING col_path=[1] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=5 type=STRING col_path=[2] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1)] tuple_path=[]) Tuple(id=1 size=40 slots=[Slot(id=6 type=STRING col_path=[] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=7 type=BIGINT col_path=[] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=8 type=BIGINT col_path=[] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1)] tuple_path=[]) Tuple(id=2 size=40 slots=[Slot(id=9 type=STRING col_path=[] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=10 type=BIGINT col_path=[] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=11 type=BIGINT col_path=[] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1)] tuple_path=[]) I0106 09:46:59.656497 19278 plan-fragment-executor.cc:303] Open(): instance_id=794f58dadaa44cb8:1f24c33dda8d00a2 I0106 09:47:20.070286 6805 RetryInvocationHandler.java:144] Exception while invoking getBlockLocations of class ClientNamenodeProtocolTranslatorPB over namenode1:8020. Trying to fail over immediately. Java exception follows: org.apache.hadoop.net.ConnectTimeoutException: Call From datanode to namenode1:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending namenode1:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750) at org.apache.hadoop.ipc.Client.call(Client.java:1476) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLojavascript:;cations(ClientNamenodeProtocolTranslatorPB.java:254) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1258) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1245) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1233) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:302) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:268) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:260) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1564) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:308) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:304) Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout while waiting for channel to be read
[jira] [Updated] (HDFS-9620) Slow writer may fail permanently if pipeline breaks.
[ https://issues.apache.org/jira/browse/HDFS-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-9620: -- Component/s: security hdfs-client > Slow writer may fail permanently if pipeline breaks. > > > Key: HDFS-9620 > URL: https://issues.apache.org/jira/browse/HDFS-9620 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, security >Reporter: Kihwal Lee >Priority: Critical > > During a block write to a datanode, if the block write time exceed the block > token expiration, the client will not be able to reestablish a block output > stream. E.g. if a node in the pipeline dies, the pipeline recovery won't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086634#comment-15086634 ] Chris Douglas commented on HDFS-8891: - bq. shall we cherry-pick this fix to 2.6.4 as well? Yes, it [looks like|https://git1-us-west.apache.org/repos/asf?p=hadoop.git;a=blob;f=hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java;h=bd4f555110f9abdd4583041a4e7c8f0670cdc844;hb=branch-2.6#l2039] this is also in branch-2.6. > HDFS concat should keep srcs order > -- > > Key: HDFS-8891 > URL: https://issues.apache.org/jira/browse/HDFS-8891 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yong Zhang >Assignee: Yong Zhang >Priority: Blocker > Fix For: 2.7.2 > > Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch > > > FSDirConcatOp.verifySrcFiles may change src files order, but it should their > order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9047) Retire libwebhdfs
[ https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086600#comment-15086600 ] Junping Du commented on HDFS-9047: -- Hi [~wheat9], what's the plan for branch-2.6/2.7? Remove it or fix it? > Retire libwebhdfs > - > > Key: HDFS-9047 > URL: https://issues.apache.org/jira/browse/HDFS-9047 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Allen Wittenauer >Assignee: Haohui Mai > Fix For: 2.8.0 > > Attachments: HDFS-9047.000.patch > > > This library is basically a mess: > * It's not part of the mvn package > * It's missing functionality and barely maintained > * It's not in the precommit runs so doesn't get exercised regularly > * It's not part of the unit tests (at least, that I can see) > * It isn't documented in any official documentation > But most importantly: > * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open) > Let's cut our losses and just remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9047) Retire libwebhdfs
[ https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086599#comment-15086599 ] Hudson commented on HDFS-9047: -- SUCCESS: Integrated in Hadoop-trunk-Commit #9061 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9061/]) HDFS-9047. Retire libwebhdfs. Contributed by Haohui Mai. (wheat9: rev c213ee085971483d737a2d4652adfda0f767eea0) * hadoop-hdfs-project/hadoop-hdfs-native-client/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_query.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_web.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_query.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/resources/FindJansson.cmake > Retire libwebhdfs > - > > Key: HDFS-9047 > URL: https://issues.apache.org/jira/browse/HDFS-9047 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Allen Wittenauer >Assignee: Haohui Mai > Fix For: 2.8.0 > > Attachments: HDFS-9047.000.patch > > > This library is basically a mess: > * It's not part of the mvn package > * It's missing functionality and barely maintained > * It's not in the precommit runs so doesn't get exercised regularly > * It's not part of the unit tests (at least, that I can see) > * It isn't documented in any official documentation > But most importantly: > * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open) > Let's cut our losses and just remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9047) Retire libwebhdfs
[ https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9047: - Release Note: libwebhdfs has been retired in 2.8.0 due to the lack of maintenance. > Retire libwebhdfs > - > > Key: HDFS-9047 > URL: https://issues.apache.org/jira/browse/HDFS-9047 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Allen Wittenauer >Assignee: Haohui Mai > Fix For: 2.8.0 > > Attachments: HDFS-9047.000.patch > > > This library is basically a mess: > * It's not part of the mvn package > * It's missing functionality and barely maintained > * It's not in the precommit runs so doesn't get exercised regularly > * It's not part of the unit tests (at least, that I can see) > * It isn't documented in any official documentation > But most importantly: > * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open) > Let's cut our losses and just remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9047) Retire libwebhdfs
[ https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9047: - Target Version/s: (was: 3.0.0) > Retire libwebhdfs > - > > Key: HDFS-9047 > URL: https://issues.apache.org/jira/browse/HDFS-9047 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Allen Wittenauer >Assignee: Haohui Mai > Fix For: 2.8.0 > > Attachments: HDFS-9047.000.patch > > > This library is basically a mess: > * It's not part of the mvn package > * It's missing functionality and barely maintained > * It's not in the precommit runs so doesn't get exercised regularly > * It's not part of the unit tests (at least, that I can see) > * It isn't documented in any official documentation > But most importantly: > * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open) > Let's cut our losses and just remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9047) Retire libwebhdfs
[ https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9047: - Resolution: Fixed Hadoop Flags: Reviewed,Incompatible change Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed the patch to trunk, branch-2 and branch-2.8. Thanks all for the reviews. > Retire libwebhdfs > - > > Key: HDFS-9047 > URL: https://issues.apache.org/jira/browse/HDFS-9047 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Allen Wittenauer >Assignee: Haohui Mai > Fix For: 2.8.0 > > Attachments: HDFS-9047.000.patch > > > This library is basically a mess: > * It's not part of the mvn package > * It's missing functionality and barely maintained > * It's not in the precommit runs so doesn't get exercised regularly > * It's not part of the unit tests (at least, that I can see) > * It isn't documented in any official documentation > But most importantly: > * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open) > Let's cut our losses and just remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory
[ https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9621: Status: Patch Available (was: Open) > getListing wrongly associates Erasure Coding policy to pre-existing > replicated files under an EC directory > > > Key: HDFS-9621 > URL: https://issues.apache.org/jira/browse/HDFS-9621 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Sushmitha Sreenivasan >Assignee: Jing Zhao >Priority: Critical > Attachments: HDFS-9621.000.patch > > > This is reported by [~ssreenivasan]: > If we set Erasure Coding policy to a directory which contains some files with > replicated blocks, later when listing files under the directory these files > will be reported as EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory
[ https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9621: Attachment: HDFS-9621.000.patch Upload a patch to fix. > getListing wrongly associates Erasure Coding policy to pre-existing > replicated files under an EC directory > > > Key: HDFS-9621 > URL: https://issues.apache.org/jira/browse/HDFS-9621 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Sushmitha Sreenivasan >Assignee: Jing Zhao >Priority: Critical > Attachments: HDFS-9621.000.patch > > > This is reported by [~ssreenivasan]: > If we set Erasure Coding policy to a directory which contains some files with > replicated blocks, later when listing files under the directory these files > will be reported as EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086570#comment-15086570 ] Konstantin Shvachko commented on HDFS-8999: --- > Let's test with the last block to see if it already solves the problem. I > hesitates to be so aggressive. Did you test without this patch? How? May be the problem is already solved just with HDFS-1172, as I argued above. > Namenode need not wait for {{blockReceived}} for the last block before > completing a file. > - > > Key: HDFS-8999 > URL: https://issues.apache.org/jira/browse/HDFS-8999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze > Attachments: h8999_20151228.patch, h8999_20160106.patch, > h8999_20160106b.patch, h8999_20160106c.patch > > > This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment > from the jira: > {quote} > ...whether we need to let NameNode wait for all the block_received msgs to > announce the replica is safe. Looking into the code, now we have ># NameNode knows the DataNodes involved when initially setting up the > writing pipeline ># If any DataNode fails during the writing, client bumps the GS and > finally reports all the DataNodes included in the new pipeline to NameNode > through the updatePipeline RPC. ># When the client received the ack for the last packet of the block (and > before the client tries to close the file on NameNode), the replica has been > finalized in all the DataNodes. > Then in this case, when NameNode receives the close request from the client, > the NameNode already knows the latest replicas for the block. Currently the > checkReplication call only counts in all the replicas that NN has already > received the block_received msg, but based on the above #2 and #3, it may be > safe to also count in all the replicas in the > BlockUnderConstructionFeature#replicas? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory
[ https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9621: Issue Type: Sub-task (was: Bug) Parent: HDFS-8031 > getListing wrongly associates Erasure Coding policy to pre-existing > replicated files under an EC directory > > > Key: HDFS-9621 > URL: https://issues.apache.org/jira/browse/HDFS-9621 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Sushmitha Sreenivasan >Assignee: Jing Zhao >Priority: Critical > > This is reported by [~ssreenivasan]: > If we set Erasure Coding policy to a directory which contains some files with > replicated blocks, later when listing files under the directory these files > will be reported as EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086562#comment-15086562 ] Konstantin Shvachko commented on HDFS-8999: --- COMPLETE state used to mean that the number of reported replicas is {{>= minReplication}}, not {{> 1}}. Would make sense to me to retain this logic. > Namenode need not wait for {{blockReceived}} for the last block before > completing a file. > - > > Key: HDFS-8999 > URL: https://issues.apache.org/jira/browse/HDFS-8999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze > Attachments: h8999_20151228.patch, h8999_20160106.patch, > h8999_20160106b.patch, h8999_20160106c.patch > > > This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment > from the jira: > {quote} > ...whether we need to let NameNode wait for all the block_received msgs to > announce the replica is safe. Looking into the code, now we have ># NameNode knows the DataNodes involved when initially setting up the > writing pipeline ># If any DataNode fails during the writing, client bumps the GS and > finally reports all the DataNodes included in the new pipeline to NameNode > through the updatePipeline RPC. ># When the client received the ack for the last packet of the block (and > before the client tries to close the file on NameNode), the replica has been > finalized in all the DataNodes. > Then in this case, when NameNode receives the close request from the client, > the NameNode already knows the latest replicas for the block. Currently the > checkReplication call only counts in all the replicas that NN has already > received the block_received msg, but based on the above #2 and #3, it may be > safe to also count in all the replicas in the > BlockUnderConstructionFeature#replicas? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory
[ https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9621: Summary: getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory(was: {{getListing}} wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory ) > getListing wrongly associates Erasure Coding policy to pre-existing > replicated files under an EC directory > > > Key: HDFS-9621 > URL: https://issues.apache.org/jira/browse/HDFS-9621 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Sushmitha Sreenivasan >Assignee: Jing Zhao >Priority: Blocker > > This is reported by [~ssreenivasan]: > If we set Erasure Coding policy to a directory which contains some files with > replicated blocks, later when listing files under the directory these files > will be reported as EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9621) getListing wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory
[ https://issues.apache.org/jira/browse/HDFS-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9621: Priority: Critical (was: Blocker) > getListing wrongly associates Erasure Coding policy to pre-existing > replicated files under an EC directory > > > Key: HDFS-9621 > URL: https://issues.apache.org/jira/browse/HDFS-9621 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Sushmitha Sreenivasan >Assignee: Jing Zhao >Priority: Critical > > This is reported by [~ssreenivasan]: > If we set Erasure Coding policy to a directory which contains some files with > replicated blocks, later when listing files under the directory these files > will be reported as EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9621) {{getListing}} wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory
Jing Zhao created HDFS-9621: --- Summary: {{getListing}} wrongly associates Erasure Coding policy to pre-existing replicated files under an EC directory Key: HDFS-9621 URL: https://issues.apache.org/jira/browse/HDFS-9621 Project: Hadoop HDFS Issue Type: Bug Components: erasure-coding Affects Versions: 3.0.0 Reporter: Sushmitha Sreenivasan Assignee: Jing Zhao Priority: Blocker This is reported by [~ssreenivasan]: If we set Erasure Coding policy to a directory which contains some files with replicated blocks, later when listing files under the directory these files will be reported as EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9047) Retire libwebhdfs
[ https://issues.apache.org/jira/browse/HDFS-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9047: - Summary: Retire libwebhdfs (was: deprecate libwebhdfs in branch-2; remove from trunk) > Retire libwebhdfs > - > > Key: HDFS-9047 > URL: https://issues.apache.org/jira/browse/HDFS-9047 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Allen Wittenauer >Assignee: Haohui Mai > Attachments: HDFS-9047.000.patch > > > This library is basically a mess: > * It's not part of the mvn package > * It's missing functionality and barely maintained > * It's not in the precommit runs so doesn't get exercised regularly > * It's not part of the unit tests (at least, that I can see) > * It isn't documented in any official documentation > But most importantly: > * It fails at it's primary mission of being pure C (HDFS-3917 is STILL open) > Let's cut our losses and just remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9607) Advance Hadoop Architecture (AHA) - HDFS
[ https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086539#comment-15086539 ] Dinesh S. Atreya commented on HDFS-9607: Copying [comment | https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15083784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15083784] from parent/umbrella JIRA to here: {quote} [Haohui Mai | https://issues.apache.org/jira/secure/ViewProfile.jspa?name=wheat9] added a comment - Yesterday I agree that the capabilities can be quite powerful. The real issue how it can be done. There are some questions need to be answered: (1) What is the semantic of update-in-place precisely when there are failures? Is it atomic and transactional? What does the consistent model look like? What are the semantics and durability guarantee look like? For example, what happens if one of the DN in the pipeline is down? What will the reader see? (2) Once you define the semantic, is the semantic / specification meaningful and complete? Does it cover all the failure cases? How to evaluate and prove there is no corner cases? (3) How to implement the semantic in code? What is the approach you are taking? Is it MVCC, distributed transaction or an ad-hoc solution tailored to HDFS? So far we all agree that it is a useful capability. I don't think it require more communications to establish it enables a number new use cases. However, I don't see this is a complete solution without addressing Steve's questions and all the questions above. It would be beneficial to have a design doc and a working prototype to clarify the confusions. {quote} > Advance Hadoop Architecture (AHA) - HDFS > > > Key: HDFS-9607 > URL: https://issues.apache.org/jira/browse/HDFS-9607 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Dinesh S. Atreya > > Link to Umbrella JIRA > https://issues.apache.org/jira/browse/HADOOP-12620 > Provide capability to carry out in-place writes/updates. Only writes in-place > are supported where the existing length does not change. > For example, "Hello World" can be replaced by "Hello HDFS!" > See > https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300 > for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086483#comment-15086483 ] Wei-Chiu Chuang commented on HDFS-9619: --- The failed tests appear to be flaky ones, unrelated to this patch. Meanwhile, I ran TestBalancerWithMultipleNameNodes.testBalancer locally for more than 600 times so far without any failures. > DataNode sometimes can not find blockpool for the correct namenode > -- > > Key: HDFS-9619 > URL: https://issues.apache.org/jira/browse/HDFS-9619 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: test > Attachments: HDFS-9619.001.patch, HDFS-9619.002.patch > > > We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to > replicate a file, because a data node is excluded. > {noformat} > File /tmp.txt could only be replicated to 0 nodes instead of minReplication > (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this > operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) > {noformat} > Relevent logs suggest root cause is due to block pool not found. > {noformat} > 2016-01-03 22:11:43,174 [DataXceiver for client > DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block > BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR > datanode.DataNode (DataXceiver.java:run(280)) - > host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: > /127.0.0.1:47318 dst: /127.0.0.1:49997 > java.io.IOException: Non existent blockpool > BP-1927700312-172.26.2.1-145188790 > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) > at java.lang.Thread.run(Thread.java:745) > {noformat} > For a bit more context, this test starts a cluster with two name nodes and > one data node. The block pools are added, but one of them is not found after > added. The root cause is due to an undetected concurrent access in a hash map > in SimulatedFSDataset (two block pools are added simultaneously). I added > some logs to print blockMap, and saw a few ConcurrentModificationExceptions. > The solution would be to use a thread safe class instead, like > ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy
[ https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-8647: - Target Version/s: 2.6.4 > Abstract BlockManager's rack policy into BlockPlacementPolicy > - > > Key: HDFS-8647 > URL: https://issues.apache.org/jira/browse/HDFS-8647 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Brahma Reddy Battula > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, > HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch, > HDFS-8647-005.patch, HDFS-8647-006.patch, HDFS-8647-007.patch, > HDFS-8647-008.patch, HDFS-8647-009.patch, HDFS-8647-branch26.patch, > HDFS-8647-branch27.patch > > > Sometimes we want to have namenode use alternative block placement policy > such as upgrade domains in HDFS-7541. > BlockManager has built-in assumption about rack policy in functions such as > useDelHint, blockHasEnoughRacks. That means when we have new block placement > policy, we need to modify BlockManager to account for the new policy. Ideally > BlockManager should ask BlockPlacementPolicy object instead. That will allow > us to provide new BlockPlacementPolicy without changing BlockManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9314) Improve BlockPlacementPolicyDefault's picking of excess replicas
[ https://issues.apache.org/jira/browse/HDFS-9314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-9314: - Target Version/s: 2.6.4 > Improve BlockPlacementPolicyDefault's picking of excess replicas > > > Key: HDFS-9314 > URL: https://issues.apache.org/jira/browse/HDFS-9314 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Xiao Chen > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: HDFS-9314.001.patch, HDFS-9314.002.patch, > HDFS-9314.003.patch, HDFS-9314.004.patch, HDFS-9314.005.patch, > HDFS-9314.006.patch, HDFS-9314.branch26.patch, HDFS-9314.branch27.patch > > > The test case used in HDFS-9313 identified NullPointerException as well as > the limitation of excess replica picking. If the current replicas are on > {SSD(rack r1), DISK(rack 2), DISK(rack 3), DISK(rack 3)} and the storage > policy changes to HOT_STORAGE_POLICY_ID, BlockPlacementPolicyDefault's won't > be able to delete SSD replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9313) Possible NullPointerException in BlockManager if no excess replica can be chosen
[ https://issues.apache.org/jira/browse/HDFS-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-9313: - Target Version/s: 2.6.4 > Possible NullPointerException in BlockManager if no excess replica can be > chosen > > > Key: HDFS-9313 > URL: https://issues.apache.org/jira/browse/HDFS-9313 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: HDFS-9313-2.patch, HDFS-9313.branch26.patch, > HDFS-9313.branch27.patch, HDFS-9313.patch > > > HDFS-8647 makes it easier to reason about various block placement scenarios. > Here is one possible case where BlockManager won't be able to find the excess > replica to delete: when storage policy changes around the same time balancer > moves the block. When this happens, it will cause NullPointerException. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978) > {noformat} > Note that it isn't found in any production clusters. Instead, it is found > from new unit tests. In addition, the issue has been there before HDFS-8647. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086351#comment-15086351 ] Hadoop QA commented on HDFS-9619: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 7s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 47s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 130m 0s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12780814/HDFS-9619.002.patch | | JIRA Issue | HDFS-9619 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 54993fa1453d 3.13.0-36-lowlat
[jira] [Commented] (HDFS-9620) Slow writer may fail permanently if pipeline breaks.
[ https://issues.apache.org/jira/browse/HDFS-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086340#comment-15086340 ] Kihwal Lee commented on HDFS-9620: -- The read path already has a mechanism for refetching block tokens, but it is currently not possible for writers to reacquire a block token for existing block being written. > Slow writer may fail permanently if pipeline breaks. > > > Key: HDFS-9620 > URL: https://issues.apache.org/jira/browse/HDFS-9620 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Critical > > During a block write to a datanode, if the block write time exceed the block > token expiration, the client will not be able to reestablish a block output > stream. E.g. if a node in the pipeline dies, the pipeline recovery won't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9620) Slow writer may fail permanently if pipeline breaks.
Kihwal Lee created HDFS-9620: Summary: Slow writer may fail permanently if pipeline breaks. Key: HDFS-9620 URL: https://issues.apache.org/jira/browse/HDFS-9620 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Critical During a block write to a datanode, if the block write time exceed the block token expiration, the client will not be able to reestablish a block output stream. E.g. if a node in the pipeline dies, the pipeline recovery won't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086320#comment-15086320 ] Elliott Clark commented on HDFS-6440: - +1 for branch-2 please. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3599) Better expose when under-construction files are preventing DN decommission
[ https://issues.apache.org/jira/browse/HDFS-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086285#comment-15086285 ] Andrew Wang commented on HDFS-3599: --- I think that same check is still in DecommissionManager#isSufficient: {code} if (bc.isUnderConstruction() && block.equals(bc.getLastBlock())) { // Can decom a UC block as long as there will still be minReplicas if (blockManager.hasMinStorage(block, numLive)) { LOG.trace("UC block {} sufficiently-replicated since numLive ({}) " + ">= minR ({})", block, numLive, blockManager.getMinStorageNum(block)); return true; {code} Looking at the HDFS-7411 diff, it did not change the unit test introduced by HDFS-5579 so I think it was carried over correctly. The high-level point is that open files block decommission. If you try to decommission the 3 nodes that are writing the 3 replicas of a block, we can't drop below minReplication and still be able to complete the block. So, decommission will wait on 3-minRep of the nodes. DecommissionManager right now has tons of debug/trace prints with these kinds of issues. It'd be good to expose this as a metric or something, so it can be easily queried by admins. That, or we solve it once and for all by actively re-routing clients away from decommissioning nodes. There are a number of ideas for how we might do this. > Better expose when under-construction files are preventing DN decommission > -- > > Key: HDFS-3599 > URL: https://issues.apache.org/jira/browse/HDFS-3599 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Zhe Zhang > > Filing on behalf of Konstantin Olchanski: > {quote} > I have been trying to decommission a data node, but the process > stalled. I followed the correct instructions, observed my node > listed in "Decommissioning Nodes", etc, observed "Under Replicated Blocks" > decrease, etc. But the count went down to "1" and the decommissin process > stalled. > There was no visible activity anywhere, nothing was happening (well, > maybe in some hidden log file somewhere something complained, > but I did not look). > It turns out that I had some files stuck in "OPENFORWRITE" mode, > as reported by "hdfs fsck / -openforwrite -files -blocks -locations -racks": > {code} > /users/trinat/data/.fuse_hidden177e0002 0 bytes, 0 block(s), > OPENFORWRITE: OK > /users/trinat/data/.fuse_hidden178d0003 0 bytes, 0 block(s), > OPENFORWRITE: OK > /users/trinat/data/.fuse_hidden1da30004 0 bytes, 1 block(s), > OPENFORWRITE: OK > 0. > BP-88378204-142.90.119.126-1340494203431:blk_6980480609696383665_20259{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[142.90.111.72:50010|RBW], > ReplicaUnderConstruction[142.90.119.162:50010|RBW], > ReplicaUnderConstruction[142.90.119.126:50010|RBW]]} len=0 repl=3 > [/detfac/142.90.111.72:50010, /isac2/142.90.119.162:50010, > /isac2/142.90.119.126:50010] > {code} > After I deleted those files, the decommission process completed successfully. > Perhaps one can add some visible indication somewhere on the HDFS status web > page > that the decommission process is stalled and maybe report why it is stalled? > Maybe the number of "OPENFORWRITE" files should be listed on the status page > next to the "Number of Under-Replicated Blocks"? (Since I know that nobody is > writing > to my HDFS, the non-zero count would give me a clue that something is wrong). > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086281#comment-15086281 ] Wei-Chiu Chuang commented on HDFS-9619: --- TestBlockReplacement.testBlockReplacement is a flaky test that often fails. TestBlockStoragePolicy.testChangeHotFileRep appears to be a flaky test too. > DataNode sometimes can not find blockpool for the correct namenode > -- > > Key: HDFS-9619 > URL: https://issues.apache.org/jira/browse/HDFS-9619 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: test > Attachments: HDFS-9619.001.patch, HDFS-9619.002.patch > > > We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to > replicate a file, because a data node is excluded. > {noformat} > File /tmp.txt could only be replicated to 0 nodes instead of minReplication > (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this > operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) > {noformat} > Relevent logs suggest root cause is due to block pool not found. > {noformat} > 2016-01-03 22:11:43,174 [DataXceiver for client > DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block > BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR > datanode.DataNode (DataXceiver.java:run(280)) - > host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: > /127.0.0.1:47318 dst: /127.0.0.1:49997 > java.io.IOException: Non existent blockpool > BP-1927700312-172.26.2.1-145188790 > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) > at java.lang.Thread.run(Thread.java:745) > {noformat} > For a bit more context, this test starts a cluster with two name nodes and > one data node. The block pools are added, but one of them is not found after > added. The root cause is due to an undetected concurrent access in a hash map > in SimulatedFSDataset (two block pools are added simultaneously). I added > some logs to print blockMap, and saw a few ConcurrentModificationExceptions. > The solution would be to use a thread safe class instead, like > ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086271#comment-15086271 ] Hadoop QA commented on HDFS-9618: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 56s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 54m 36s {color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 140m 42s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 | | | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12780805/HDFS-9618.001.patch | | JIRA Issue | HDFS-9618 | | Optional Tests | asflicense compile javac javadoc mvnin
[jira] [Commented] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086225#comment-15086225 ] Hadoop QA commented on HDFS-9619: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 32s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 50m 30s {color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 130m 1s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.datanode.TestBlockReplacement | | | hadoop.hdfs.TestBlockStoragePolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12780801/HDFS-9619.001.patch | | JIRA Issue | HDFS-9619 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 30744bddc127 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precomm
[jira] [Commented] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086145#comment-15086145 ] Wei-Chiu Chuang commented on HDFS-9619: --- Well, maybe the test case is not needed. It's pretty obvious there is a concurrency bug using HashMap without synchronized block. > DataNode sometimes can not find blockpool for the correct namenode > -- > > Key: HDFS-9619 > URL: https://issues.apache.org/jira/browse/HDFS-9619 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: test > Attachments: HDFS-9619.001.patch, HDFS-9619.002.patch > > > We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to > replicate a file, because a data node is excluded. > {noformat} > File /tmp.txt could only be replicated to 0 nodes instead of minReplication > (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this > operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) > {noformat} > Relevent logs suggest root cause is due to block pool not found. > {noformat} > 2016-01-03 22:11:43,174 [DataXceiver for client > DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block > BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR > datanode.DataNode (DataXceiver.java:run(280)) - > host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: > /127.0.0.1:47318 dst: /127.0.0.1:49997 > java.io.IOException: Non existent blockpool > BP-1927700312-172.26.2.1-145188790 > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) > at java.lang.Thread.run(Thread.java:745) > {noformat} > For a bit more context, this test starts a cluster with two name nodes and > one data node. The block pools are added, but one of them is not found after > added. The root cause is due to an undetected concurrent access in a hash map > in SimulatedFSDataset (two block pools are added simultaneously). I added > some logs to print blockMap, and saw a few ConcurrentModificationExceptions. > The solution would be to use a thread safe class instead, like > ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9617) my java client use muti-thread to put a same file to a same hdfs uri, after no lease error,then client OutOfMemoryError
[ https://issues.apache.org/jira/browse/HDFS-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved HDFS-9617. -- Resolution: Invalid > my java client use muti-thread to put a same file to a same hdfs uri, after > no lease error,then client OutOfMemoryError > --- > > Key: HDFS-9617 > URL: https://issues.apache.org/jira/browse/HDFS-9617 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zuotingbing > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on /Tmp2/43.bmp.tmp (inode 2913263): File does not exist. [Lease. > Holder: DFSClient_NONMAPREDUCE_2084151715_1, pendingcreates: 250] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3358) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3160) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3042) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1653) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1411) > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy14.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:391) > at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy15.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1473) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1290) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:536) > my java client(JVM -Xmx=2G) : > jmap TOP15: > num #instances #bytes class name > -- >1: 48072 2053976792 [B >2: 458525987568 >3: 458525878944 >4: 33634193112 >5: 33632548168 >6: 27332299008 >7: 5332191696 [Ljava.nio.ByteBuffer; >8: 247332026600 [C >9: 312872002368 > org.apache.hadoop.hdfs.DFSOutputStream$Packet > 10: 31972 767328 java.util.LinkedList$Node > 11: 22845 548280 java.lang.String > 12: 20372 488928 java.util.concurrent.atomic.AtomicLong > 13: 3700 452984 java.lang.Class > 14: 981 439576 > 15: 5583 376344 [S -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9617) my java client use muti-thread to put a same file to a same hdfs uri, after no lease error,then client OutOfMemoryError
[ https://issues.apache.org/jira/browse/HDFS-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086124#comment-15086124 ] Kihwal Lee commented on HDFS-9617: -- bq. my java client use muti-thread to put a same file to a same hdfs uri Unless each thread creates a separate instance of UserGroupInformation for its file system, they will all look like one writer to the namenode, causing all sorts of problems. > my java client use muti-thread to put a same file to a same hdfs uri, after > no lease error,then client OutOfMemoryError > --- > > Key: HDFS-9617 > URL: https://issues.apache.org/jira/browse/HDFS-9617 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zuotingbing > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on /Tmp2/43.bmp.tmp (inode 2913263): File does not exist. [Lease. > Holder: DFSClient_NONMAPREDUCE_2084151715_1, pendingcreates: 250] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3358) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3160) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3042) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1653) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1411) > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy14.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:391) > at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy15.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1473) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1290) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:536) > my java client(JVM -Xmx=2G) : > jmap TOP15: > num #instances #bytes class name > -- >1: 48072 2053976792 [B >2: 458525987568 >3: 458525878944 >4: 33634193112 >5: 33632548168 >6: 27332299008 >7: 5332191696 [Ljava.nio.ByteBuffer; >8: 247332026600 [C >9: 312872002368 > org.apache.hadoop.hdfs.DFSOutputStream$Packet > 10: 31972 767328 java.util.LinkedList$Node > 11: 22845 548280 java.lang.String > 12: 20372 488928 java.util.concurrent.atomic.AtomicLong > 13: 3700 452984 java.lang.Class > 14: 981 439576 > 15: 5583 376344 [S -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9576) HTrace: collect position/length information on read operations
[ https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086111#comment-15086111 ] Hadoop QA commented on HDFS-9576: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-hdfs-project/hadoop-hdfs-client (total was 136, now 137). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 21m 13s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12780808/HDFS-9576.04.patch | | JIRA Issue | HDFS-9576 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 6c850e486918 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | |
[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9619: -- Attachment: HDFS-9619.002.patch Rev02: Added a test case. In this test case {{TestSimulatedFSDataset.testConcurrentAddBlockPool()}}, it starts two threads, which add different block pools concurrently, and then attempt to add a block into the pool. If the block pool is not found, it throws an IOException. Without the rev01 patch that uses ConcurrentHashMap, this test case always fail because it can not find an added block pool; after the patch, I am not seeing any failures. > DataNode sometimes can not find blockpool for the correct namenode > -- > > Key: HDFS-9619 > URL: https://issues.apache.org/jira/browse/HDFS-9619 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: test > Attachments: HDFS-9619.001.patch, HDFS-9619.002.patch > > > We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to > replicate a file, because a data node is excluded. > {noformat} > File /tmp.txt could only be replicated to 0 nodes instead of minReplication > (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this > operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) > {noformat} > Relevent logs suggest root cause is due to block pool not found. > {noformat} > 2016-01-03 22:11:43,174 [DataXceiver for client > DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block > BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR > datanode.DataNode (DataXceiver.java:run(280)) - > host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: > /127.0.0.1:47318 dst: /127.0.0.1:49997 > java.io.IOException: Non existent blockpool > BP-1927700312-172.26.2.1-145188790 > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) > at java.lang.Thread.run(Thread.java:745) > {noformat} > For a bit more context, this test starts a cluster with two name nodes and > one data node. The block pools are added, but one of them is not found after > added. The root cause is due to an undetected concurrent access in a hash map > in SimulatedFSDataset (two block pools are added simultaneously). I added > some logs to print blockMap, and saw a few ConcurrentModificationExceptions. > The solution would be to use a thread safe class instead, like > ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086101#comment-15086101 ] Jing Zhao commented on HDFS-8999: - # How about making BlockNotYetCompleteException simply an IOException and then in {{appendFile}} wrapping it inside of a {{RetriableException}} (like the current {{checkNameNodeSafeMode}})? In this way we can depend on the existing retry logic fo {{RetriableException}} and do not need to have explicit retry in {{callAppend}}. # We may need a unit test for the append retry in a block-not-yet-complete scenario. # In {{commitOrCompleteLastBlock}} and {{addStoredBlock}}, looks like we do not need the {{hasMinStorage}} check when adding the replicas to the pending queue? Otherwise the block may be later put into the under-replicated queue with {{QUEUE_WITH_CORRUPT_BLOCKS}} priority. If this change makes sense to you, we may also need another unit test here. {code} if (hasMinStorage(lastBlock)) { if (b) { addExpectedReplicasToPending(lastBlock, bc); } {code} > Namenode need not wait for {{blockReceived}} for the last block before > completing a file. > - > > Key: HDFS-8999 > URL: https://issues.apache.org/jira/browse/HDFS-8999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze > Attachments: h8999_20151228.patch, h8999_20160106.patch, > h8999_20160106b.patch, h8999_20160106c.patch > > > This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment > from the jira: > {quote} > ...whether we need to let NameNode wait for all the block_received msgs to > announce the replica is safe. Looking into the code, now we have ># NameNode knows the DataNodes involved when initially setting up the > writing pipeline ># If any DataNode fails during the writing, client bumps the GS and > finally reports all the DataNodes included in the new pipeline to NameNode > through the updatePipeline RPC. ># When the client received the ack for the last packet of the block (and > before the client tries to close the file on NameNode), the replica has been > finalized in all the DataNodes. > Then in this case, when NameNode receives the close request from the client, > the NameNode already knows the latest replicas for the block. Currently the > checkReplication call only counts in all the replicas that NN has already > received the block_received msg, but based on the above #2 and #3, it may be > safe to also count in all the replicas in the > BlockUnderConstructionFeature#replicas? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9498) Move code that tracks blocks with future generation stamps to BlockManagerSafeMode
[ https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086076#comment-15086076 ] Hudson commented on HDFS-9498: -- SUCCESS: Integrated in Hadoop-trunk-Commit #9058 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9058/]) HDFS-9498. Move code that tracks blocks with future generation stamps to (arp: rev 67c9780609f707c11626f05028ddfd28f1b878f1) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogRace.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManagerSafeMode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerSafeMode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > Move code that tracks blocks with future generation stamps to > BlockManagerSafeMode > -- > > Key: HDFS-9498 > URL: https://issues.apache.org/jira/browse/HDFS-9498 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-9498.000.patch, HDFS-9498.001.patch, > HDFS-9498.002.patch, HDFS-9498.003.patch, HDFS-9498.004.patch > > > [HDFS-4015] counts and reports orphaned blocks > {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in > {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the > {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks > to this class. > Leaving safe mode checks blocks with future GS in {{FSNamesystem}}. This code > can also be moved to {{BlockManagerSafeMode}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9617) my java client use muti-thread to put a same file to a same hdfs uri, after no lease error,then client OutOfMemoryError
[ https://issues.apache.org/jira/browse/HDFS-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086069#comment-15086069 ] Mingliang Liu commented on HDFS-9617: - If this is not yet confirmed a bug or a feature request, please send email to [mailto:u...@hadoop.apache.org]. People there are willing to help you with your problems. > my java client use muti-thread to put a same file to a same hdfs uri, after > no lease error,then client OutOfMemoryError > --- > > Key: HDFS-9617 > URL: https://issues.apache.org/jira/browse/HDFS-9617 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zuotingbing > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on /Tmp2/43.bmp.tmp (inode 2913263): File does not exist. [Lease. > Holder: DFSClient_NONMAPREDUCE_2084151715_1, pendingcreates: 250] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3358) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3160) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3042) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1653) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1411) > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy14.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:391) > at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy15.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1473) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1290) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:536) > my java client(JVM -Xmx=2G) : > jmap TOP15: > num #instances #bytes class name > -- >1: 48072 2053976792 [B >2: 458525987568 >3: 458525878944 >4: 33634193112 >5: 33632548168 >6: 27332299008 >7: 5332191696 [Ljava.nio.ByteBuffer; >8: 247332026600 [C >9: 312872002368 > org.apache.hadoop.hdfs.DFSOutputStream$Packet > 10: 31972 767328 java.util.LinkedList$Node > 11: 22845 548280 java.lang.String > 12: 20372 488928 java.util.concurrent.atomic.AtomicLong > 13: 3700 452984 java.lang.Class > 14: 981 439576 > 15: 5583 376344 [S -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9618: Component/s: namenode > Fix mismatch between log level and guard in > BlockManager#computeRecoveryWorkForBlocks > - > > Key: HDFS-9618 > URL: https://issues.apache.org/jira/browse/HDFS-9618 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-9618.001.patch > > > Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086063#comment-15086063 ] Mingliang Liu commented on HDFS-9618: - +1 (non-binding) > Fix mismatch between log level and guard in > BlockManager#computeRecoveryWorkForBlocks > - > > Key: HDFS-9618 > URL: https://issues.apache.org/jira/browse/HDFS-9618 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-9618.001.patch > > > Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9498) Move code that tracks blocks with future generation stamps to BlockManagerSafeMode
[ https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086055#comment-15086055 ] Mingliang Liu commented on HDFS-9498: - Thank you [~arpitagarwal] for your insightful comments and commit. Thanks to [~anu] for his original effort of tracking blocks with future GS, and for code review. > Move code that tracks blocks with future generation stamps to > BlockManagerSafeMode > -- > > Key: HDFS-9498 > URL: https://issues.apache.org/jira/browse/HDFS-9498 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-9498.000.patch, HDFS-9498.001.patch, > HDFS-9498.002.patch, HDFS-9498.003.patch, HDFS-9498.004.patch > > > [HDFS-4015] counts and reports orphaned blocks > {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in > {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the > {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks > to this class. > Leaving safe mode checks blocks with future GS in {{FSNamesystem}}. This code > can also be moved to {{BlockManagerSafeMode}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086050#comment-15086050 ] Mingliang Liu commented on HDFS-9618: - We have 5 levels of priority queues and the aggregation should be fast. Leaving it as it-is is may be better though. > Fix mismatch between log level and guard in > BlockManager#computeRecoveryWorkForBlocks > - > > Key: HDFS-9618 > URL: https://issues.apache.org/jira/browse/HDFS-9618 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-9618.001.patch > > > Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9498) Move code that tracks blocks with future generation stamps to BlockManagerSafeMode
[ https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-9498: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2 for 2.9.0. branch-2.8 conflicts looked non-trivial so I skipped including it for 2.8.0. Thanks for the contribution [~liuml07] and for the review [~anu]. > Move code that tracks blocks with future generation stamps to > BlockManagerSafeMode > -- > > Key: HDFS-9498 > URL: https://issues.apache.org/jira/browse/HDFS-9498 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-9498.000.patch, HDFS-9498.001.patch, > HDFS-9498.002.patch, HDFS-9498.003.patch, HDFS-9498.004.patch > > > [HDFS-4015] counts and reports orphaned blocks > {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in > {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the > {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks > to this class. > Leaving safe mode checks blocks with future GS in {{FSNamesystem}}. This code > can also be moved to {{BlockManagerSafeMode}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086043#comment-15086043 ] Mingliang Liu commented on HDFS-9618: - Thanks for working on this, [~iwasakims]i]. Calling {{neededReplications.size()}} or {{pendingReplications.size()}} seems to have low overhead so the guard can be removed, as [~drankye] proposed. The code that logs which blocks have been scheduled for replication should be kept as we iterate all recovery work and its targets. > Fix mismatch between log level and guard in > BlockManager#computeRecoveryWorkForBlocks > - > > Key: HDFS-9618 > URL: https://issues.apache.org/jira/browse/HDFS-9618 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-9618.001.patch > > > Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9576) HTrace: collect position/length information on read operations
[ https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086033#comment-15086033 ] Hadoop QA commented on HDFS-9576: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-hdfs-project/hadoop-hdfs-client (total was 136, now 137). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 22m 12s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12780803/HDFS-9576.03.patch | | JIRA Issue | HDFS-9576 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 68830c147581 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Bu
[jira] [Updated] (HDFS-9576) HTrace: collect position/length information on read operations
[ https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9576: Attachment: HDFS-9576.04.patch Thanks Xiao for the good catch! Updating the patch to address. Also renaming {{readScope}} to {{scope}} to be consistent with other places using temporary scope variables. > HTrace: collect position/length information on read operations > -- > > Key: HDFS-9576 > URL: https://issues.apache.org/jira/browse/HDFS-9576 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, tracing >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, > HDFS-9576.02.patch, HDFS-9576.03.patch, HDFS-9576.04.patch > > > HTrace currently collects the path of each read operation (both stateful and > position reads). To better understand applications' I/O behavior, it is also > useful to track the position and length of read operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-9618: --- Affects Version/s: (was: 3.0.0) 2.8.0 Status: Patch Available (was: Open) > Fix mismatch between log level and guard in > BlockManager#computeRecoveryWorkForBlocks > - > > Key: HDFS-9618 > URL: https://issues.apache.org/jira/browse/HDFS-9618 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-9618.001.patch > > > Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-9618: --- Attachment: HDFS-9618.001.patch Thanks for the comment, [~drankye]. I attached 001. bq. Then is there any reason for the following block? The reason seems to be that {{UnderReplicatedBlocks#size}} is not just a accessor but it do some aggregation. I left the part as is in the attached patch. > Fix mismatch between log level and guard in > BlockManager#computeRecoveryWorkForBlocks > - > > Key: HDFS-9618 > URL: https://issues.apache.org/jira/browse/HDFS-9618 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-9618.001.patch > > > Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9576) HTrace: collect position/length information on read operations
[ https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086016#comment-15086016 ] Xiao Chen commented on HDFS-9576: - Thanks for the work, and sorry for jumping in. {code} scope.addKVAnnotation("requiredLength", Integer.toString(reqLen)); {code} Should the key be "requestedLength"? > HTrace: collect position/length information on read operations > -- > > Key: HDFS-9576 > URL: https://issues.apache.org/jira/browse/HDFS-9576 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, tracing >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, > HDFS-9576.02.patch, HDFS-9576.03.patch > > > HTrace currently collects the path of each read operation (both stateful and > position reads). To better understand applications' I/O behavior, it is also > useful to track the position and length of read operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9498) Move code that tracks blocks with future generation stamps to BlockManagerSafeMode
[ https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-9498: Summary: Move code that tracks blocks with future generation stamps to BlockManagerSafeMode (was: Move code that tracks orphan blocks to BlockManagerSafeMode) > Move code that tracks blocks with future generation stamps to > BlockManagerSafeMode > -- > > Key: HDFS-9498 > URL: https://issues.apache.org/jira/browse/HDFS-9498 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9498.000.patch, HDFS-9498.001.patch, > HDFS-9498.002.patch, HDFS-9498.003.patch, HDFS-9498.004.patch > > > [HDFS-4015] counts and reports orphaned blocks > {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in > {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the > {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks > to this class. > Leaving safe mode checks blocks with future GS in {{FSNamesystem}}. This code > can also be moved to {{BlockManagerSafeMode}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085978#comment-15085978 ] Masatake Iwasaki commented on HDFS-9618: This was wrong. The log level was changed by HDFS-6860. > Fix mismatch between log level and guard in > BlockManager#computeRecoveryWorkForBlocks > - > > Key: HDFS-9618 > URL: https://issues.apache.org/jira/browse/HDFS-9618 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > > Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9576) HTrace: collect path/offset/length information on read operations
[ https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9576: Description: HTrace currently collects the path of each read operation (both stateful and position reads). To better understand applications' I/O behavior, it is also useful to track the position and length of read operations. > HTrace: collect path/offset/length information on read operations > - > > Key: HDFS-9576 > URL: https://issues.apache.org/jira/browse/HDFS-9576 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, tracing >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, > HDFS-9576.02.patch, HDFS-9576.03.patch > > > HTrace currently collects the path of each read operation (both stateful and > position reads). To better understand applications' I/O behavior, it is also > useful to track the position and length of read operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9576) HTrace: collect position/length information on read operations
[ https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9576: Summary: HTrace: collect position/length information on read operations (was: HTrace: collect path/offset/length information on read operations) > HTrace: collect position/length information on read operations > -- > > Key: HDFS-9576 > URL: https://issues.apache.org/jira/browse/HDFS-9576 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, tracing >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, > HDFS-9576.02.patch, HDFS-9576.03.patch > > > HTrace currently collects the path of each read operation (both stateful and > position reads). To better understand applications' I/O behavior, it is also > useful to track the position and length of read operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9576) HTrace: collect path/offset/length information on read operations
[ https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9576: Attachment: HDFS-9576.03.patch Good catch! Updating the patch to address. > HTrace: collect path/offset/length information on read operations > - > > Key: HDFS-9576 > URL: https://issues.apache.org/jira/browse/HDFS-9576 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, tracing >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, > HDFS-9576.02.patch, HDFS-9576.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9576) HTrace: collect path/offset/length information on read operations
[ https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9576: Summary: HTrace: collect path/offset/length information on read operations (was: HTrace: collect path/offset/length information on read and write operations) > HTrace: collect path/offset/length information on read operations > - > > Key: HDFS-9576 > URL: https://issues.apache.org/jira/browse/HDFS-9576 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, tracing >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, > HDFS-9576.02.patch, HDFS-9576.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9619: -- Component/s: test datanode > DataNode sometimes can not find blockpool for the correct namenode > -- > > Key: HDFS-9619 > URL: https://issues.apache.org/jira/browse/HDFS-9619 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: test > Attachments: HDFS-9619.001.patch > > > We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to > replicate a file, because a data node is excluded. > {noformat} > File /tmp.txt could only be replicated to 0 nodes instead of minReplication > (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this > operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) > {noformat} > Relevent logs suggest root cause is due to block pool not found. > {noformat} > 2016-01-03 22:11:43,174 [DataXceiver for client > DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block > BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR > datanode.DataNode (DataXceiver.java:run(280)) - > host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: > /127.0.0.1:47318 dst: /127.0.0.1:49997 > java.io.IOException: Non existent blockpool > BP-1927700312-172.26.2.1-145188790 > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) > at java.lang.Thread.run(Thread.java:745) > {noformat} > For a bit more context, this test starts a cluster with two name nodes and > one data node. The block pools are added, but one of them is not found after > added. The root cause is due to an undetected concurrent access in a hash map > in SimulatedFSDataset (two block pools are added simultaneously). I added > some logs to print blockMap, and saw a few ConcurrentModificationExceptions. > The solution would be to use a thread safe class instead, like > ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9619: -- Labels: test (was: ) > DataNode sometimes can not find blockpool for the correct namenode > -- > > Key: HDFS-9619 > URL: https://issues.apache.org/jira/browse/HDFS-9619 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: test > Attachments: HDFS-9619.001.patch > > > We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to > replicate a file, because a data node is excluded. > {noformat} > File /tmp.txt could only be replicated to 0 nodes instead of minReplication > (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this > operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) > {noformat} > Relevent logs suggest root cause is due to block pool not found. > {noformat} > 2016-01-03 22:11:43,174 [DataXceiver for client > DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block > BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR > datanode.DataNode (DataXceiver.java:run(280)) - > host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: > /127.0.0.1:47318 dst: /127.0.0.1:49997 > java.io.IOException: Non existent blockpool > BP-1927700312-172.26.2.1-145188790 > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) > at java.lang.Thread.run(Thread.java:745) > {noformat} > For a bit more context, this test starts a cluster with two name nodes and > one data node. The block pools are added, but one of them is not found after > added. The root cause is due to an undetected concurrent access in a hash map > in SimulatedFSDataset (two block pools are added simultaneously). I added > some logs to print blockMap, and saw a few ConcurrentModificationExceptions. > The solution would be to use a thread safe class instead, like > ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9619: -- Status: Patch Available (was: Open) > DataNode sometimes can not find blockpool for the correct namenode > -- > > Key: HDFS-9619 > URL: https://issues.apache.org/jira/browse/HDFS-9619 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9619.001.patch > > > We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to > replicate a file, because a data node is excluded. > {noformat} > File /tmp.txt could only be replicated to 0 nodes instead of minReplication > (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this > operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) > {noformat} > Relevent logs suggest root cause is due to block pool not found. > {noformat} > 2016-01-03 22:11:43,174 [DataXceiver for client > DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block > BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR > datanode.DataNode (DataXceiver.java:run(280)) - > host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: > /127.0.0.1:47318 dst: /127.0.0.1:49997 > java.io.IOException: Non existent blockpool > BP-1927700312-172.26.2.1-145188790 > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) > at java.lang.Thread.run(Thread.java:745) > {noformat} > For a bit more context, this test starts a cluster with two name nodes and > one data node. The block pools are added, but one of them is not found after > added. The root cause is due to an undetected concurrent access in a hash map > in SimulatedFSDataset (two block pools are added simultaneously). I added > some logs to print blockMap, and saw a few ConcurrentModificationExceptions. > The solution would be to use a thread safe class instead, like > ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9619: -- Description: We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to replicate a file, because a data node is excluded. {noformat} File /tmp.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) {noformat} Relevent logs suggest root cause is due to block pool not found. {noformat} 2016-01-03 22:11:43,174 [DataXceiver for client DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR datanode.DataNode (DataXceiver.java:run(280)) - host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: /127.0.0.1:47318 dst: /127.0.0.1:49997 java.io.IOException: Non existent blockpool BP-1927700312-172.26.2.1-145188790 at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) at java.lang.Thread.run(Thread.java:745) {noformat} For a bit more context, this test starts a cluster with two name nodes and one data node. The block pools are added, but one of them is not found after added. The root cause is due to an undetected concurrent access in a hash map in SimulatedFSDataset (two block pools are added simultaneously). I added some logs to print blockMap, and saw a few ConcurrentModificationExceptions. The solution would be to use a thread safe class instead, like ConcurrentHashMap. was: We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to replicate a file, because a data node is excluded. {noformat} File /tmp.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:230
[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9619: -- Attachment: HDFS-9619.001.patch Rev01. Use ConcurrentHashMap instead of HashMap in SimulatedFSDataset to store block pools. Tested locally. Before the patch, the test failed 1 in 10 runs. After the patch, I've been running for ~100 runs without seeing any failures. > DataNode sometimes can not find blockpool for the correct namenode > -- > > Key: HDFS-9619 > URL: https://issues.apache.org/jira/browse/HDFS-9619 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9619.001.patch > > > We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to > replicate a file, because a data node is excluded. > {noformat} > File /tmp.txt could only be replicated to 0 nodes instead of minReplication > (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this > operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) > {noformat} > Relevent logs suggest root cause is due to block pool not found. > {noformat} > 2016-01-03 22:11:43,174 [DataXceiver for client > DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block > BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR > datanode.DataNode (DataXceiver.java:run(280)) - > host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: > /127.0.0.1:47318 dst: /127.0.0.1:49997 > java.io.IOException: Non existent blockpool > BP-1927700312-172.26.2.1-145188790 > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) > at java.lang.Thread.run(Thread.java:745) > {noformat} > For a bit more context, this test starts a cluster with two name nodes and > one data node. The block pools are added, but one of them is not found after > added. The root cause is due to an undetected concurrent access in a hash map > in SimulatedFSDataset (two block pools are added simultaneously). The > solution would be to use a thread safe class instead, like ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9615) Fix variable name typo in DFSConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085950#comment-15085950 ] Hudson commented on HDFS-9615: -- SUCCESS: Integrated in Hadoop-trunk-Commit #9057 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9057/]) HDFS-9615. Fix variable name typo in DFSConfigKeys. (Contributed by Ray (arp: rev b9936689c9ea37bf0050e7970643bcddfc9cfdbe) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java > Fix variable name typo in DFSConfigKeys > --- > > Key: HDFS-9615 > URL: https://issues.apache.org/jira/browse/HDFS-9615 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HDFS-9615.001.patch > > > Ran across this typo in the variable name: > DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDONW_DEFAULT > should clearly be > DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDOWN_DEFAULT > i.e. the "N" and the "W" are swapped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9619: -- Description: We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to replicate a file, because a data node is excluded. {noformat} File /tmp.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) {noformat} Relevent logs suggest root cause is due to block pool not found. {noformat} 2016-01-03 22:11:43,174 [DataXceiver for client DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR datanode.DataNode (DataXceiver.java:run(280)) - host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: /127.0.0.1:47318 dst: /127.0.0.1:49997 java.io.IOException: Non existent blockpool BP-1927700312-172.26.2.1-145188790 at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) at java.lang.Thread.run(Thread.java:745) {noformat} For a bit more context, this test starts a cluster with two name nodes and one data node. The block pools are added, but one of them is not found after added. The root cause is due to an undetected concurrent access in a hash map in SimulatedFSDataset (two block pools are added simultaneously). The solution would be to use a thread safe class instead, like ConcurrentHashMap. was: We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to replicate a file, because a data node is excluded. {noformat} File /tmp.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) at java.security.AccessController.doPrivileged(Native Method) at javax.security.a
[jira] [Commented] (HDFS-9615) Fix variable name typo in DFSConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085943#comment-15085943 ] Ray Chiang commented on HDFS-9615: -- Thanks for the review and the commit! > Fix variable name typo in DFSConfigKeys > --- > > Key: HDFS-9615 > URL: https://issues.apache.org/jira/browse/HDFS-9615 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HDFS-9615.001.patch > > > Ran across this typo in the variable name: > DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDONW_DEFAULT > should clearly be > DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDOWN_DEFAULT > i.e. the "N" and the "W" are swapped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9576) HTrace: collect path/offset/length information on read and write operations
[ https://issues.apache.org/jira/browse/HDFS-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085932#comment-15085932 ] Masatake Iwasaki commented on HDFS-9576: I agree to fix tracing of write in follow-up. The 02 patch looks good but 1 nit. The name of variable should not be {{ignored}} because it is used now. > HTrace: collect path/offset/length information on read and write operations > --- > > Key: HDFS-9576 > URL: https://issues.apache.org/jira/browse/HDFS-9576 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, tracing >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-9576.00.patch, HDFS-9576.01.patch, > HDFS-9576.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
[ https://issues.apache.org/jira/browse/HDFS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9619: -- Description: We sometimes see {{TestBalancerWithMultipleNameNodes.testBalancer}} failed to replicate a file, because a data node is excluded. {noformat} File /tmp.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) {noformat} Relevent logs suggest root cause is due to block pool not found. {noformat} 2016-01-03 22:11:43,174 [DataXceiver for client DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR datanode.DataNode (DataXceiver.java:run(280)) - host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: /127.0.0.1:47318 dst: /127.0.0.1:49997 java.io.IOException: Non existent blockpool BP-1927700312-172.26.2.1-145188790 at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) at java.lang.Thread.run(Thread.java:745) {noformat} For a bit more context, this test starts a cluster with two name nodes and one data node. The block pools are added, but one of them is not found after added. The root cause is due to an undetected concurrent access in a hash map in SimulatedFSDataset. The solution would be to use a thread safe class instead, like ConcurrentHashMap. was: We sometimes see TestBalancerWithMultipleNameNodes.testBalancer failed to replicate a file, because a data node is excluded. {noformat} File /tmp.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apa
[jira] [Commented] (HDFS-9498) Move code that tracks orphan blocks to BlockManagerSafeMode
[ https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085933#comment-15085933 ] Anu Engineer commented on HDFS-9498: +1 (non-binding), LGTM > Move code that tracks orphan blocks to BlockManagerSafeMode > --- > > Key: HDFS-9498 > URL: https://issues.apache.org/jira/browse/HDFS-9498 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9498.000.patch, HDFS-9498.001.patch, > HDFS-9498.002.patch, HDFS-9498.003.patch, HDFS-9498.004.patch > > > [HDFS-4015] counts and reports orphaned blocks > {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in > {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the > {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks > to this class. > Leaving safe mode checks blocks with future GS in {{FSNamesystem}}. This code > can also be moved to {{BlockManagerSafeMode}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
Wei-Chiu Chuang created HDFS-9619: - Summary: DataNode sometimes can not find blockpool for the correct namenode Key: HDFS-9619 URL: https://issues.apache.org/jira/browse/HDFS-9619 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Environment: Jenkins Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang We sometimes see TestBalancerWithMultipleNameNodes.testBalancer failed to replicate a file, because a data node is excluded. {noformat} File /tmp.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) {noformat} Relevent logs suggest root cause is due to block pool not found. {noformat} 2016-01-03 22:11:43,174 [DataXceiver for client DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR datanode.DataNode (DataXceiver.java:run(280)) - host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: /127.0.0.1:47318 dst: /127.0.0.1:49997 java.io.IOException: Non existent blockpool BP-1927700312-172.26.2.1-145188790 at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) at java.lang.Thread.run(Thread.java:745) {noformat} For a bit more context, this test starts a cluster with two name nodes and one data node. The block pools are added, but one of them is not found after added. The root cause is due to an undetected concurrent access in a hash map in SimulatedFSDataset. The solution would be to use a thread safe class instead, like ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9615) Fix variable name typo in DFSConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-9615: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) I've committed this to trunk. Since HDFS-6353 which introduced this setting is not in branch-2 no commit to branch-2 is required. Thanks for the contribution [~rchiang]. > Fix variable name typo in DFSConfigKeys > --- > > Key: HDFS-9615 > URL: https://issues.apache.org/jira/browse/HDFS-9615 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HDFS-9615.001.patch > > > Ran across this typo in the variable name: > DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDONW_DEFAULT > should clearly be > DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDOWN_DEFAULT > i.e. the "N" and the "W" are swapped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9615) Fix variable name typo in DFSConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-9615: Summary: Fix variable name typo in DFSConfigKeys (was: Fix variable name typo in DFSConfigKeys#DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDONW_DEFAULT) > Fix variable name typo in DFSConfigKeys > --- > > Key: HDFS-9615 > URL: https://issues.apache.org/jira/browse/HDFS-9615 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Trivial > Attachments: HDFS-9615.001.patch > > > Ran across this typo in the variable name: > DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDONW_DEFAULT > should clearly be > DFS_NAMENODE_MISSING_CHECKPOINT_PERIODS_BEFORE_SHUTDOWN_DEFAULT > i.e. the "N" and the "W" are swapped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode
[ https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085887#comment-15085887 ] Daryn Sharp commented on HDFS-9276: --- I'll be taking a look to ensure this doesn't break our IP-failover HA. > Failed to Update HDFS Delegation Token for long running application in HA mode > -- > > Key: HDFS-9276 > URL: https://issues.apache.org/jira/browse/HDFS-9276 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, ha, security >Affects Versions: 2.7.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu > Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, > HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, > HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, > HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, > HDFS-9276.12.patch, HDFS-9276.13.patch, debug1.PNG, debug2.PNG > > > The Scenario is as follows: > 1. NameNode HA is enabled. > 2. Kerberos is enabled. > 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with > NameNode. > 4. We want to update the HDFS Delegation Token for long running applicatons. > HDFS Client will generate private tokens for each NameNode. When we update > the HDFS Delegation Token, these private tokens will not be updated, which > will cause token expired. > This bug can be reproduced by the following program: > {code} > import java.security.PrivilegedExceptionAction > import org.apache.hadoop.conf.Configuration > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.hadoop.security.UserGroupInformation > object HadoopKerberosTest { > def main(args: Array[String]): Unit = { > val keytab = "/path/to/keytab/xxx.keytab" > val principal = "x...@abc.com" > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > val ugi = UserGroupInformation.createRemoteUser("test") > ugi.addCredentials(creds1) > ugi.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > var i = 0 > while (true) { > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > UserGroupInformation.getCurrentUser.addCredentials(creds1) > val fs = FileSystem.get( new Configuration()) > i += 1 > println() > println(i) > println(fs.listFiles(new Path("/user"), false)) > Thread.sleep(60 * 1000) > } > null > } > }) > } > } > {code} > To reproduce the bug, please set the following configuration to Name Node: > {code} > dfs.namenode.delegation.token.max-lifetime = 10min > dfs.namenode.delegation.key.update-interval = 3min > dfs.namenode.delegation.token.renew-interval = 3min > {code} > The bug will occure after 3 minutes. > The stacktrace is: > {code} > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >
[jira] [Commented] (HDFS-6142) StandbyException wrapped to InvalidToken exception
[ https://issues.apache.org/jira/browse/HDFS-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085867#comment-15085867 ] Kihwal Lee commented on HDFS-6142: -- bq. For example, in datanode's DataXceiver.copyBlock, it will call checkAccess... That's block token, not delegation token. > StandbyException wrapped to InvalidToken exception > -- > > Key: HDFS-6142 > URL: https://issues.apache.org/jira/browse/HDFS-6142 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.2.0 >Reporter: Ding Yuan > > The following code in > org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java: > {noformat} > public byte[] retrievePassword( > DelegationTokenIdentifier identifier) throws InvalidToken { > try { > // this check introduces inconsistency in the authentication to a > // HA standby NN. non-token auths are allowed into the namespace which > // decides whether to throw a StandbyException. tokens are a bit > // different in that a standby may be behind and thus not yet know > // of all tokens issued by the active NN. the following check does > // not allow ANY token auth, however it should allow known tokens in > namesystem.checkOperation(OperationCategory.READ); > } catch (StandbyException se) { > // FIXME: this is a hack to get around changing method signatures by > // tunneling a non-InvalidToken exception as the cause which the > // RPC server will unwrap before returning to the client > InvalidToken wrappedStandby = new InvalidToken("StandbyException"); > wrappedStandby.initCause(se); > throw wrappedStandby; > } > return super.retrievePassword(identifier); > } > {noformat} > A StandbyException from namesystem.checkOperation is wrapped to InvalidToken > exception. The comment suggests that the RPC server will unwrap it to > StandbyException before sending back to the client, but this may not be the > case for every code path. For example, in datanode's DataXceiver.copyBlock, > it will call checkAccess which eventually might call retrievePassword, but > when copyBlock catches an InvalidToken exception, it would simply send to the > client that exception without unwrapping it. > I am not exactly sure about the possible consequence, but it seems client > treats StandbyException (which is perhaps much more serious) very different > from InvalidToken exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085715#comment-15085715 ] Hadoop QA commented on HDFS-8999: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 34s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 34s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 37s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 37s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s {color} | {color:red} Patch generated 4 new checkstyle issues in hadoop-hdfs-project (total was 633, now 632). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 33s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 16s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 174m 43s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.server.namenode.snapshot.TestSnapsho
[jira] [Commented] (HDFS-9279) Decomissioned capacity should not be considered for configured/used capacity
[ https://issues.apache.org/jira/browse/HDFS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085577#comment-15085577 ] Kihwal Lee commented on HDFS-9279: -- bq. Because the data present in the decommissioning nodes would eventually be transferred over to the live nodes. Is this understanding correct? The replicas are not invalidated on decommissioning nodes even after replicating, so the capacity tracking was not accurate either. It ended up double counting the used space toward the end, at which the process seems to stall more frequently nowadays (this is another topic). If a significant portion of a cluster is decommissioned, the stat will look very strange and confuse people. That actually happened to us multiple times. The free/total ratio will look considerably smaller than the actual value. Monitoring tools cannot easily dismiss it as 'Nah.. it's a temporary discrepancy caused by decommissioning.' With this change, the storage capacity stat has become more like regular under-replication scenario caused by node/disk outages. Additional space will be used for re-replicating those blocks, but it is not yet allocated to those blocks. That's the actual state of used/usable storage and the stat reflects that now. If we want the stat to reflect what would be used in the future, we are talking space reservation feature. > Decomissioned capacity should not be considered for configured/used capacity > > > Key: HDFS-9279 > URL: https://issues.apache.org/jira/browse/HDFS-9279 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9279-v1.patch, HDFS-9279-v2.patch, > HDFS-9279-v3.patch, HDFS-9279-v4.patch > > > Capacity of a decommissioned node is being accounted as configured and used > capacity metrics. This gives incorrect perception of cluster usage. > Once a node is decommissioned, its capacity should be considered similar to a > dead node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9617) my java client use muti-thread to put a same file to a same hdfs uri, after no lease error,then client OutOfMemoryError
[ https://issues.apache.org/jira/browse/HDFS-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085526#comment-15085526 ] Kai Zheng commented on HDFS-9617: - Thanks for reporting this. bq. my java client use muti-thread to put a same file to a same hdfs uri I'm a little confused. How you did this or what's your code like? Would you elaborate this a little bit? It may help to understand why it happened. Thanks. > my java client use muti-thread to put a same file to a same hdfs uri, after > no lease error,then client OutOfMemoryError > --- > > Key: HDFS-9617 > URL: https://issues.apache.org/jira/browse/HDFS-9617 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zuotingbing > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on /Tmp2/43.bmp.tmp (inode 2913263): File does not exist. [Lease. > Holder: DFSClient_NONMAPREDUCE_2084151715_1, pendingcreates: 250] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3358) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3160) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3042) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1653) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1411) > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy14.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:391) > at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy15.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1473) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1290) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:536) > my java client(JVM -Xmx=2G) : > jmap TOP15: > num #instances #bytes class name > -- >1: 48072 2053976792 [B >2: 458525987568 >3: 458525878944 >4: 33634193112 >5: 33632548168 >6: 27332299008 >7: 5332191696 [Ljava.nio.ByteBuffer; >8: 247332026600 [C >9: 312872002368 > org.apache.hadoop.hdfs.DFSOutputStream$Packet > 10: 31972 767328 java.util.LinkedList$Node > 11: 22845 548280 java.lang.String > 12: 20372 488928 java.util.concurrent.atomic.AtomicLong > 13: 3700 452984 java.lang.Class > 14: 981 439576 > 15: 5583 376344 [S -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9612) DistCp worker threads are not terminated after jobs are done.
[ https://issues.apache.org/jira/browse/HDFS-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085507#comment-15085507 ] Hadoop QA commented on HDFS-9612: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 57s {color} | {color:red} hadoop-tools_hadoop-distcp-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 51, now 51). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 53s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 50s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 26s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12780744/HDFS-9612.005.patch | | JIRA Issue | HDFS-9612 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b36a01e84ad5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ma
[jira] [Updated] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8999: -- Attachment: h8999_20160106c.patch h8999_20160106c.patch: fixes a NPE. > Namenode need not wait for {{blockReceived}} for the last block before > completing a file. > - > > Key: HDFS-8999 > URL: https://issues.apache.org/jira/browse/HDFS-8999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze > Attachments: h8999_20151228.patch, h8999_20160106.patch, > h8999_20160106b.patch, h8999_20160106c.patch > > > This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment > from the jira: > {quote} > ...whether we need to let NameNode wait for all the block_received msgs to > announce the replica is safe. Looking into the code, now we have ># NameNode knows the DataNodes involved when initially setting up the > writing pipeline ># If any DataNode fails during the writing, client bumps the GS and > finally reports all the DataNodes included in the new pipeline to NameNode > through the updatePipeline RPC. ># When the client received the ack for the last packet of the block (and > before the client tries to close the file on NameNode), the replica has been > finalized in all the DataNodes. > Then in this case, when NameNode receives the close request from the client, > the NameNode already knows the latest replicas for the block. Currently the > checkReplication call only counts in all the replicas that NN has already > received the block_received msg, but based on the above #2 and #3, it may be > safe to also count in all the replicas in the > BlockUnderConstructionFeature#replicas? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9612) DistCp worker threads are not terminated after jobs are done.
[ https://issues.apache.org/jira/browse/HDFS-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9612: -- Attachment: HDFS-9612.005.patch Rev05: added @throws to make Javadoc happy. > DistCp worker threads are not terminated after jobs are done. > - > > Key: HDFS-9612 > URL: https://issues.apache.org/jira/browse/HDFS-9612 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.8.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9612.001.patch, HDFS-9612.002.patch, > HDFS-9612.003.patch, HDFS-9612.004.patch, HDFS-9612.005.patch > > > In HADOOP-11827, a producer-consumer style thread pool was introduced to > parallelize the task of listing files/directories. > We have a use case where a distcp job is run during the commit phase of a MR2 > job. However, it was found distcp does not terminate ProducerConsumer thread > pools properly. Because threads are not terminated, those MR2 jobs never > finish. > In a more typical use case where distcp is run as a standalone job, those > threads are terminated forcefully when the java process is terminated. So > these leaked threads did not become a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085482#comment-15085482 ] Hadoop QA commented on HDFS-8999: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 38s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 38s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 38s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s {color} | {color:red} Patch generated 4 new checkstyle issues in hadoop-hdfs-project (total was 633, now 632). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 44s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 4s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 151m 15s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestModTime | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.hdfs.server.namenode.TestFileContextAcl | | | hadoop.hdfs.TestErasureCodingPolicies | | | hado
[jira] [Commented] (HDFS-9279) Decomissioned capacity should not be considered for configured/used capacity
[ https://issues.apache.org/jira/browse/HDFS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085445#comment-15085445 ] Rajat Jain commented on HDFS-9279: -- While it makes sense to not include decommissioning nodes in configured capacity, but they should still be used for calculating used capacity. Because the data present in the decommissioning nodes would eventually be transferred over to the live nodes. Is this understanding correct? > Decomissioned capacity should not be considered for configured/used capacity > > > Key: HDFS-9279 > URL: https://issues.apache.org/jira/browse/HDFS-9279 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9279-v1.patch, HDFS-9279-v2.patch, > HDFS-9279-v3.patch, HDFS-9279-v4.patch > > > Capacity of a decommissioned node is being accounted as configured and used > capacity metrics. This gives incorrect perception of cluster usage. > Once a node is decommissioned, its capacity should be considered similar to a > dead node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085371#comment-15085371 ] Kai Zheng commented on HDFS-9618: - Good catch! The pattern to use {{logger.isInfoEnabled}} shouldn't be used for no reason. I guess the case in question uses the condition {{blockLog.isInfoEnabled()}} to decide to compose and write the log message or not for performance consideration. Then is there any reason for the following block? Better to change it by the way in the fix. {code} if (blockLog.isDebugEnabled()) { blockLog.debug("BLOCK* neededReplications = {} pendingReplications = {}", neededReplications.size(), pendingReplications.size()); } {code} > Fix mismatch between log level and guard in > BlockManager#computeRecoveryWorkForBlocks > - > > Key: HDFS-9618 > URL: https://issues.apache.org/jira/browse/HDFS-9618 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > > Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085335#comment-15085335 ] Masatake Iwasaki commented on HDFS-9618: The log level had been info but it seemed to be changed to debug in EC branch (6b6a63bb). > Fix mismatch between log level and guard in > BlockManager#computeRecoveryWorkForBlocks > - > > Key: HDFS-9618 > URL: https://issues.apache.org/jira/browse/HDFS-9618 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > > Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
[ https://issues.apache.org/jira/browse/HDFS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085324#comment-15085324 ] Masatake Iwasaki commented on HDFS-9618: {code} if (blockLog.isInfoEnabled()) { // log which blocks have been scheduled for replication for(BlockRecoveryWork rw : recovWork){ DatanodeStorageInfo[] targets = rw.getTargets(); if (targets != null && targets.length != 0) { StringBuilder targetList = new StringBuilder("datanode(s)"); for (DatanodeStorageInfo target : targets) { targetList.append(' '); targetList.append(target.getDatanodeDescriptor()); } blockLog.debug("BLOCK* ask {} to replicate {} to {}", rw.getSrcNodes(), rw.getBlock(), targetList); } } } {code} > Fix mismatch between log level and guard in > BlockManager#computeRecoveryWorkForBlocks > - > > Key: HDFS-9618 > URL: https://issues.apache.org/jira/browse/HDFS-9618 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > > Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9618) Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks
Masatake Iwasaki created HDFS-9618: -- Summary: Fix mismatch between log level and guard in BlockManager#computeRecoveryWorkForBlocks Key: HDFS-9618 URL: https://issues.apache.org/jira/browse/HDFS-9618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Debug log message is constructed when {{Logger#isInfoEnabled}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8999: -- Attachment: h8999_20160106b.patch h8999_20160106b.patch: addresses Jing's comment. > Namenode need not wait for {{blockReceived}} for the last block before > completing a file. > - > > Key: HDFS-8999 > URL: https://issues.apache.org/jira/browse/HDFS-8999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze > Attachments: h8999_20151228.patch, h8999_20160106.patch, > h8999_20160106b.patch > > > This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment > from the jira: > {quote} > ...whether we need to let NameNode wait for all the block_received msgs to > announce the replica is safe. Looking into the code, now we have ># NameNode knows the DataNodes involved when initially setting up the > writing pipeline ># If any DataNode fails during the writing, client bumps the GS and > finally reports all the DataNodes included in the new pipeline to NameNode > through the updatePipeline RPC. ># When the client received the ack for the last packet of the block (and > before the client tries to close the file on NameNode), the replica has been > finalized in all the DataNodes. > Then in this case, when NameNode receives the close request from the client, > the NameNode already knows the latest replicas for the block. Currently the > checkReplication call only counts in all the replicas that NN has already > received the block_received msg, but based on the above #2 and #3, it may be > safe to also count in all the replicas in the > BlockUnderConstructionFeature#replicas? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)