[ 
https://issues.apache.org/jira/browse/BIGTOP-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360207#comment-14360207
 ] 

Michael Weiser commented on BIGTOP-1689:
----------------------------------------

I had a look at it but haven't actually tested it. From what I can see the 
implementation is fine and a lot of effort seems to have gone into it. So I'd 
hate to shoot it down. Reparsing the XML document certainly has the advantage 
that the settings can be merged properly and duplicate entries be avoided.

However, only XML seems to be supported at the moment. Support for other file 
formats such key=value (*.cfg, hadoop-env.sh) would need additional functions 
and would quickly lead to a need of some sort of plug-in system. Also, the 
custom parser function somewhat obfuscates how this logic works and makes it 
less accessible to other developers for future extensions.

The alternative of generating the whole config file from a hash would be very 
intrusive and would need to handle cases where currently we insert certain 
parts of the config only if some condition is met and values are generated e.g. 
from arrays using some sort of concatenation or looping over it and inserting 
multiple settings at once.

I have to admit, I do not know if the Hadoop XML configuration file syntax 
actually requires each setting to be given only once or if it supports a parse 
order for overriding the same setting within the same file. It looks like XML 
so assuming the former is natural but not necessarily true. Can an actual 
Hadoop expert chime in here? [~cos]?

Also, we should consider if it's actually necessary to allow *overriding* of 
settings or if it would be enough to allow *adding* of additional settings. We 
already have a logic to adjust settings we currently put into the configs. If 
there are static values, then those could and should be eliminated by either 
removing them and using the default, adding a new setting as an explicit class 
variable or implementing the additional option logic discussed here and adding 
that option as a default value there. If approached that way, adding of 
addtional settings should be enough and the need for merging and overriding 
disappears.

Without actually having tried it, I think it should be possible to implement a 
hybrid approach of keeping the current templates but adding a footer that 
generates additional config entries from the override hash using something like 
this:
{noformat}
<% @override_hash.each do |setting,value| -%>
  <property>
    <name><%= setting %></name>
    <value><%= value %></value>
  </property>
<% end -%>
{noformat}
This could be adjusted for different file formats as necessary. Duplicate 
settings (the hash overriding the template) would need to be handled by the 
config file parser of the respective Hadoop component. This would need to be 
investigated for each config file format. As, said, for the most commonly used 
XML format I'm not sure if and how this will work. In the worst case, trying to 
override a setting would lead to undefined behaviour by the configuration 
parser of the software. I don't think there's an easy way to prevent that. An 
error message and failure to start from the software would IMO be acceptable.

To avoid duplicating that code in each template, it would be great if it could 
be recursively included from a separate fragment. However, from a quick look at 
the documentation, Puppet doesn't seem to allow that. It does allow to specify 
multiple template files to the template function however. So something like 
this should work:
{noformat}
template('hdfs-site.xml', 'override.xml')
{noformat}
The only disadvantage here is that for XML files the closing "</configuration>" 
tag would need to move from the main template into the footer leaving two 
non-XML-compliant template fragments.

It might also be worth a look how perhaps the puppetlabs concat or other 
modules can help implement this. I'd rather have a dependency on a standard 
module everyone has installed already anyways (such as concat) than any kind of 
special logic.

[~petersla]: What do you think? Am I too naive in my approach to this?

> puppet: Allow merging arbitrary site configuration
> --------------------------------------------------
>
>                 Key: BIGTOP-1689
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1689
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: deployment
>    Affects Versions: 0.8.0
>            Reporter: Peter Slawski
>            Assignee: Peter Slawski
>             Fix For: 0.9.0
>
>         Attachments: BIGTOP-1689.1.patch
>
>
> Puppet should be flexible in allowing arbitrary configuration name value 
> pairs to be merged into a given site.xml file that was generated from a 
> template.
> For example, the following could be included in site.yaml which would add a 
> configuration entry for hadoop.tmp.dir in core-site.xml:
> {code}
> hadoop::common_hdfs::hadoop_core_site_overrides:
>   "hadoop.tmp.dir": "/mnt/var/lib/hadoop/tmp"
> {code}
> This could be implemented as a puppet custom-function taking in the output of 
> template:
> {code}
>     file {
>       "/etc/hadoop/conf/core-site.xml":
>         content => merge_site(template('hadoop/core-site.xml'),  
> $hadoop_core_site_overrides)
>         require => [Package["hadoop"]],
>     }
> {code}
> Perhaps another approach would be to have site.xml templates be created from 
> a single map of name value pairs. The merge would happen before the file 
> content is generated from the template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to