Solved: Here is the solution ...

Thanks Billie … you set me on the right track for looking after the problem … 
at least the diagnostics given by slider, saying that the resource requests are 
unsatisfiable helped a lot.

The cloudera config setup made by the CM is wrong and resolves all hosts to 
‘/default-rack’ while the configured rack for the hadoop nodes in reality was 
‘/default’. So for every resource request, the Slider App Master sent the wrong 
rack name, because CM configured it in the wrong way.

The problem is the Cloudera Manager (CM) using tons of Hadoop Configuration 
Directories and using a slight different way to resolve hosts to racks ... AND 
IT NEVER USES THE HADOOP_CONF_DIR you set ...

When you look for the standard topology settings in core-site.xml of the client 
configuration, the CM provides, you see the following:

net.topology.impl: org.apache.hadoop.net.NetworkTopology
net.topology.node.switch.mapping.impl: org.apache.hadoop.net.ScriptBasedMapping
net.topology.script.file.name: 
net.topology.script.number.args: 100
net.topology.table.file.name:

You easily see they configure ScriptBasedMapping but don’t provide a script … 
in this case the rack always resolves to ‘/default-rack’ as explained in the 
hadoop docs.

Okay, we said the IT department to configure the correct topology script and 
now it starts to get horrible ... 

We downloaded the new client configuration from the CM. Everything looked fine, 
topology script was defined and calling the script resolved hostnames to the 
correct rack '/default'.

But the Slider App Manager still said the requests are unsatisfiable and still 
resolved to '/default-rack'. We couldn't believe our eyes since the topology 
settings looked good.

Then I found this in the Slider App Manager logs:

2017-08-02 14:20:05,066 [main] INFO  appmaster.SliderAppMaster - System env 
HADOOP_CONF_DIR=/run/cloudera-scm-agent/process/2390-yarn-NODEMANAGER

What is that? Of course this damn CM knows better which configuration to inject 
... and of course nobody is allowed to look at that config, at least we were 
not allowed ... 
So we asked the IT department to check the topology settings in core-site.xml 
in that directory and they were STILL wrong ... and despite what you configured 
in CM, it didn't change.

Finally we found a way to set the right topology properties via CM:

Goto "YARN Service Advanced Configuration Snippet (Safety Valve) for 
core-site.xml" in Cloudera Manager and insert the topology settings there. 
Cloudera thinks topology settings are only for HDFS and NOT for Yarn ... oh my 
god ... after this change we finally had the correct topology settings in this 
damn CM shadow hadoop configuration directory ... AND slider resolved the 
correct rack and worked as intended and could place its containers on the 
requested resources ...

I think it is a pity that the Yarn Resourcemanager logs don't show rejected 
resource requests, even in DEBUG mode. And it did not show any reasons why a 
request was rejected ... in my opinion the Yarn Resourcemanager should log 
denied resource requests as a WARNING or at least as a INFO ... I think denied 
requests are of much more interest and importance then accepted requests.

Sorry guys for all the inconveniences and thank your help. I hope you get 
slider integrated into hadoop as a standard tool ... cheerio

--

Am 01.08.17, 15:47 schrieb "Billie Rinaldi" <billie.rina...@gmail.com>:

    I don't think it would matter which scheduler you are using.
 

Reply via email to