[jira] [Resolved] (HDFS-12649) handling of corrupt blocks not suitable for commodity hardware

2017-10-12 Thread Gruust (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gruust resolved HDFS-12649.
---
Resolution: Invalid

> handling of corrupt blocks not suitable for commodity hardware
> --
>
> Key: HDFS-12649
> URL: https://issues.apache.org/jira/browse/HDFS-12649
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.8.1
>Reporter: Gruust
>Priority: Minor
>
> Hadoop's documentation tells me it's suitable for commodity hardware in the 
> sense that hardware failures are expected to happen frequently. However, 
> there is currently no automatic handling of corrupted blocks, which seems a 
> bit contradictory to me.
> See: 
> https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files
> This is even problematic for data integrity as the redundancy is not kept at 
> the desired level without manual intervention and therefore in a timely 
> manner. If there is a corrupted block, I would at least expect that the 
> namenode forces the creation of an additional good replica to keep up the 
> redundancy level, ie. the redundancy level should never include corrupted 
> data... which it currently does:
> "UnderReplicatedBlocks" : 0,
> "CorruptBlocks" : 2,
> (namenode /jmx http dump)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12653) Implement toArray() and toSubArray() for ReadOnlyList

2017-10-12 Thread Manoj Govindassamy (JIRA)
Manoj Govindassamy created HDFS-12653:
-

 Summary: Implement toArray() and toSubArray() for ReadOnlyList
 Key: HDFS-12653
 URL: https://issues.apache.org/jira/browse/HDFS-12653
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy


{{ReadOnlyList}} today gives an unmodifiable view of the backing List. This 
list supports following Util methods for easy construction of read only views 
of any given list. 

{noformat}
public static  ReadOnlyList asReadOnlyList(final List list) 

public static  List asList(final ReadOnlyList list)
{noformat}

{{asList}} above additionally overrides {{Object[] toArray()}} of the 
{{java.util.List}} interface. Unlike the {{java.util.List}}, the above one 
returns an array of Objects referring to the backing list and avoid any copying 
of objects. Given that we have many usages of read only lists,

1. Lets have a light-weight / shared-view {{toArray()}} implementation for 
{{ReadOnlyList}} as well. 
2. Additionally, similar to {{java.util.List#subList(fromIndex, toIndex)}}, 
lets have {{ReadOnlyList#subArray(fromIndex, toIndex)}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12652) INodeAttributesProvider#getAttributes(): Avoid multiple conversions of path components byte[][] to String[] when requesting INode attributes

2017-10-12 Thread Manoj Govindassamy (JIRA)
Manoj Govindassamy created HDFS-12652:
-

 Summary: INodeAttributesProvider#getAttributes(): Avoid multiple 
conversions of path components byte[][] to String[] when requesting INode 
attributes
 Key: HDFS-12652
 URL: https://issues.apache.org/jira/browse/HDFS-12652
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 3.0.0-beta1
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy


{{INodeAttributesProvider#getAttributes}} needs the path components passed in 
to be an array of Strings. Where as the INode and related layers maintain path 
components as an array of byte[]. So, these layers are required to convert each 
byte[] component of the path back into a string and for multiple times when 
requesting for INode attributes from the Provider. 

That is, the path "/a/b/c" requires calling the attribute provider with: (1) 
"", (2) "", "a", (3) "", "a","b", (4) "", "a","b", "c". Every single one of 
those strings were freshly (re)converted from a byte[]. Say, a file listing is 
done on a huge directory containing 100s of millions of files, then these 
multiple time redundant conversions of byte[][] to String[] create lots of tiny 
object garbages, occupying memory and affecting performance. Better if we could 
avoid creating redundant copies of path component strings.
  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12651) Ozone: SCM: avoid synchronously loading all the keys from containers upon SCM datanode start

2017-10-12 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-12651:
-

 Summary: Ozone: SCM: avoid synchronously loading all the keys from 
containers upon SCM datanode start
 Key: HDFS-12651
 URL: https://issues.apache.org/jira/browse/HDFS-12651
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7240
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


This is based on code review feedback from HDFS-12411 to avoid slow SCM 
datanode restart when there are large amount of keys and containers. 

E.g., 5 GB per container / 4 KB per key = 1.25 Million keys per container.

The proposed solution is async loading containers/key size info and update the 
containerStatus once done. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12650) Use slf4j instead of log4j in LeaseManager

2017-10-12 Thread Ajay Kumar (JIRA)
Ajay Kumar created HDFS-12650:
-

 Summary: Use slf4j instead of log4j in LeaseManager
 Key: HDFS-12650
 URL: https://issues.apache.org/jira/browse/HDFS-12650
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ajay Kumar
Assignee: Ajay Kumar
 Fix For: 3.1.0


FileNamesystem is still using log4j dependencies. We should move those to  
slf4j, as most of the methods using log4j are deprecated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12649) handling of corrupt blocks not suitable for commodity hardware

2017-10-12 Thread Gruust (JIRA)
Gruust created HDFS-12649:
-

 Summary: handling of corrupt blocks not suitable for commodity 
hardware
 Key: HDFS-12649
 URL: https://issues.apache.org/jira/browse/HDFS-12649
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.8.1
Reporter: Gruust
Priority: Minor


Hadoop's documentation tells me it's suitable for commodity hardware in the 
sense that hardware failures are expected to happen frequently. However, there 
is currently no automatic handling of corrupted blocks, which seems a bit 
contradictory to me.

See: https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files

This is even problematic for data integrity as the redundancy is not kept at 
the desired level without manual intervention. If there is a corrupted block, I 
would at least expect that the namenode forces the creation of an additional 
good replica to keep up the redundancy level. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-10-12 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/555/

[Oct 11, 2017 7:45:28 AM] (kai.zheng) HDFS-12635. Unnecessary exception 
declaration of the CellBuffers
[Oct 11, 2017 8:57:38 AM] (rohithsharmaks) MAPREDUCE-6951. Improve exception 
message when
[Oct 11, 2017 9:09:53 AM] (aajisaka) HDFS-12622. Fix enumerate in 
HDFSErasureCoding.md. Contributed by Yiqun
[Oct 11, 2017 3:31:02 PM] (jlowe) YARN-7082. TestContainerManagerSecurity 
failing in trunk. Contributed by
[Oct 11, 2017 5:06:43 PM] (stevel) HADOOP-14913. Sticky bit implementation for 
rename() operation in Azure
[Oct 11, 2017 6:14:33 PM] (sunilg) YARN-6620. Add support in NodeManager to 
isolate GPU devices by using
[Oct 11, 2017 7:26:14 PM] (arp) HDFS-12627. Fix typo in DFSAdmin command 
output. Contributed by Ajay
[Oct 11, 2017 7:29:35 PM] (arp) HDFS-12542. Update javadoc and documentation 
for listStatus. Contributed
[Oct 11, 2017 10:21:21 PM] (Arun Suresh) HADOOP-13556. Change 
Configuration.getPropsWithPrefix to use getProps
[Oct 11, 2017 10:25:28 PM] (wangda) YARN-7205. Log improvements for the 
ResourceUtils. (Sunil G via wangda)
[Oct 11, 2017 10:58:20 PM] (aengineer) HADOOP-13102. Update GroupsMapping 
documentation to reflect the new




-1 overall


The following subsystems voted -1:
unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.crypto.key.TestCachingKeyProvider 
   hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency 
   hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler 
   hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter 
   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt 

Timed out junit tests :

   
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.TestZKConfigurationStore
 
   
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl 
   org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart 
   
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
 
   
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler
 
   org.apache.hadoop.yarn.server.resourcemanager.TestRMHA 
   
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 
   
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler 
   
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore 
   
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore 
   
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 
   
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService 
   org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService 
   org.apache.hadoop.yarn.server.resourcemanager.TestLeaderElectorService 
   
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesReservation
 
   
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue 
   
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivities
 
   
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps 
   org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf 
   org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage 
   
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
   
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor
 
   
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAsyncScheduling
 
   
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 
   
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart 
   
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf 
   
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart 
   org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens 
   
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
 
   org.apache.hadoop.yarn.client.api.impl.TestAMRMClient 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/555/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/555/artifact/out/diff-compile-javac-root.txt
  [288K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/555/artifact/out/diff-checksty

[jira] [Created] (HDFS-12648) DN should provide feedback to NN for throttling commands

2017-10-12 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12648:
--

 Summary: DN should provide feedback to NN for throttling commands
 Key: HDFS-12648
 URL: https://issues.apache.org/jira/browse/HDFS-12648
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


The NN should avoid sending commands to a DN with a high number of outstanding 
commands.  The heartbeat could provide this feedback via perhaps a simple count 
of the commands or rate of processing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12647) DN commands processing should be async

2017-10-12 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12647:
--

 Summary: DN commands processing should be async
 Key: HDFS-12647
 URL: https://issues.apache.org/jira/browse/HDFS-12647
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


Due to dataset lock contention, service actors may encounter significant 
latency while processing  DN commands.  Even the queuing of async deletions 
require multiple lock acquisitions.  A slow disk will cause a backlog of 
xceivers instantiating block sender/receivers which starves the actor and leads 
to the NN falsely declaring the node dead.

Async processing of all commands will free the actor to perform its primary 
purpose of heartbeating and block reporting.  Note that FBRs will be dependent 
on queued block invalidations not being included in the report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12646) Avoid IO while holding the FsDataset lock

2017-10-12 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12646:
--

 Summary: Avoid IO while holding the FsDataset lock
 Key: HDFS-12646
 URL: https://issues.apache.org/jira/browse/HDFS-12646
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


IO operations should be allowed while holding the dataset lock.  Notable 
offenders include but are not limited to the instantiation of a block 
sender/receiver, constructing the path to a block, unfinalizing a block.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12645) FSDatasetImpl lock will stall BP service actors and may cause missing blocks

2017-10-12 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12645:
--

 Summary: FSDatasetImpl lock will stall BP service actors and may 
cause missing blocks
 Key: HDFS-12645
 URL: https://issues.apache.org/jira/browse/HDFS-12645
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


The DN is extremely susceptible to a slow volume due bad locking practices.  DN 
operations require a fs dataset lock.  IO in the dataset lock should not be 
permissible as it leads to severe performance degradation and possibly 
(temporarily) missing blocks.

A slow disk will cause pipelines to experience significant latency and 
timeouts, increasing lock/io contention while cleaning up, leading to more 
timeouts, etc.  Meanwhile, the actor service thread is interleaving multiple 
lock acquire/releases with xceivers.  If many commands are issued, the node may 
be incorrectly declared as dead.

HDFS-12639 documents that both actors synchronize on the offer service lock 
while processing commands.  A backlogged active actor will block the standby 
actor and cause it to go dead too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12644) Offer a non-privileged listEncryptionZone operation

2017-10-12 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-12644:
--

 Summary: Offer a non-privileged listEncryptionZone operation
 Key: HDFS-12644
 URL: https://issues.apache.org/jira/browse/HDFS-12644
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: encryption, namenode
Affects Versions: 3.0.0-alpha1, 2.8.0
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


As discussed in HDFS-12484, we can consider adding a non-privileged 
listEncryptionZone for better user experience.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11797) BlockManager#createLocatedBlocks() can throw ArrayIndexOutofBoundsException when corrupt replicas are inconsistent

2017-10-12 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-11797.

Resolution: Duplicate

I'm going to close it as a dup of HDFS-11445. Feel free to reopen if this is 
not the case. Thanks [~kshukla]!

> BlockManager#createLocatedBlocks() can throw ArrayIndexOutofBoundsException 
> when corrupt replicas are inconsistent
> --
>
> Key: HDFS-11797
> URL: https://issues.apache.org/jira/browse/HDFS-11797
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Critical
> Attachments: HDFS-11797.001.patch
>
>
> The calculation for {{numMachines}} can be too less (causing 
> ArrayIndexOutOfBoundsException) or too many (causing NPE (HDFS-9958)) if data 
> structures find inconsistent number of corrupt replicas. This was earlier 
> found related to failed storages. This JIRA tracks a change that works for 
> all possible cases of inconsistencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12630) Rolling restart can create inconsistency between blockMap and corrupt replicas map

2017-10-12 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-12630.

Resolution: Duplicate

> Rolling restart can create inconsistency between blockMap and corrupt 
> replicas map
> --
>
> Key: HDFS-12630
> URL: https://issues.apache.org/jira/browse/HDFS-12630
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Andre Araujo
>
> After a NN rolling restart several HDFS files started showing block problems. 
> Running FSCK for one of the files or for the directory that contained it 
> would complete with a FAILED message but without any details of the failure.
> The NameNode log showed the following:
> {code}
> 2017-10-10 16:58:32,147 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.92.128.4 for path 
> /user/prod/data/file_20171010092201.csv at Tue Oct 10 16:58:32 PDT 2017
> 2017-10-10 16:58:32,147 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent 
> number of corrupt replicas for blk_1941920008_1133195379 blockMap has 1 but 
> corrupt replicas map has 2
> 2017-10-10 16:58:32,147 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Fsck on path '/user/prod/data/file_20171010092201.csv' FAILED
> java.lang.ArrayIndexOutOfBoundsException
> {code}
> After triggering a full block report for all the DNs the problem went away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12643) HDFS maintenance state behaviour is confusing and not well documented

2017-10-12 Thread Andre Araujo (JIRA)
Andre Araujo created HDFS-12643:
---

 Summary: HDFS maintenance state behaviour is confusing and not 
well documented
 Key: HDFS-12643
 URL: https://issues.apache.org/jira/browse/HDFS-12643
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation, namenode
Reporter: Andre Araujo


The current implementation of the HDFS maintenance state feature is confusing 
and error-prone. The documentation is missing important information that's 
required for the correct use of the feature.

For example, if the Hadoop admin wants to put a single node in maintenance 
state, he/she can add a single entry to the maintenance file with the contents:

{code}
{
   "hostName": "host-1.example.com",
   "adminState": "IN_MAINTENANCE",
   "maintenanceExpireTimeInMS": 1507663698000
}
{code}

Let's say now that the actual maintenance finished well before the set 
expiration time and the Hadoop admin wants to bring the node back to NORMAL 
state. It would be natural to simply change the state of the node, as show 
below, and run another refresh:

{code}
{
   "hostName": "host-1.example.com",
   "adminState": "NORMAL"
}
{code}

The configuration file above, though, not only take the node {{host-1}} out of 
maintenance state but it also *blacklists all the other DataNodes*. This 
behaviour seems inconsistent to me and is due to {{emptyInServiceNodeLists}} 
being set to {{false}} 
[here|https://github.com/apache/hadoop/blob/230b85d5865b7e08fb7aaeab45295b5b966011ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java#L80]
 only when there is at least one node with {{adminState = NORMAL}} listed in 
the file.

I believe that it would be more consistent, and less error prone, to simply 
implement the following:
* If the dfs.hosts file is empty, all nodes are allowed and in normal state
* If the file is not empty, any host *not* listed in the file is *blacklisted*, 
regardless of the state of the hosts listed in the file.

Regardless of the implementation being changed or not, the documentation also 
needs to be updated to ensure the readers know of the caveats mentioned above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org