An approach to look at is having a simple topology DSL say in yaml that the end user interacts with to specify the logical set of nodes, what daemons are on each node and how they are interconnected (for more complex things like ha or indicating how monitors, user authentication ,etc could plug in).. It would be easy then to write some simple code to "compile this into for example" hiera.yaml files or as an ENC or even both.
I think this begs the question as to what type of end user is expected to use this; if it is a Puppet savvy end user then having them specify things in hiera would be clear, but if going after end users that are not well versed in.Puppet having a Bigtop topology dsl" may resonate better and can be simpler. Now, I have done much work around this area so would be happy to propose a starting point for a topology DSL if this approach makes sense. I can also flesh out a number of issues that could be addressed to see what priority people would give it. One issue for example is whether the topology just logically identifies nodes and groups of nodes (e.g., the set of slaves) and does not require ip or dns addresses to be assigned to them; this allows more sharable designs without locking users into what would be one team's deployment specific settings. It also facilitates the process where given a toplogy we spin up a set of nodes and in a late binding way attach the host addresses to the nodes and to the attributes for connecting between hosts. For the specific example of init-hdfs,sh if I am correctly guessing at what issue would be is that ideally want to only create directories for services that are being actually used and not create it for all directories or equivalently for the data driven way in which this is created you want to construct the description of directories to include as a function of what deamons are on the topology nodes. This is something I have tackled and can include this in a write up if interested, - Rich ----- Original Message ----- From: dev@bigtop.apache.org To: <dev@bigtop.apache.org> Cc: Sent: Tue, 10 Mar 2015 09:19:38 +0000 (UTC) Subject: [jira] [Commented] (BIGTOP-1746) Introduce the concept of roles in bigtop cluster deployment [ https://issues.apache.org/jira/browse/BIGTOP-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354586#comment-14354586 ] Michael Weiser commented on BIGTOP-1746: ---------------------------------------- At least in the Puppet manifests there are only two places where the current role concept is implemented: - manifests/cluster.pp decides what daemons to put on which box - hieradata/bigtop/cluster.yaml governs what to put into their config files cluster.yaml can easily be adjusted and overridden so it doesn't force the concept of head node and frontend into the config files any more. So the main point of attack from my point of view is cluster.pp. Unfortunately it also implements some dependencies between modules and some add-on logic, such as running init-hdfs.sh. Basically I would suggest moving these dependencies into their respective modules and then just throwing away cluster.pp. After that, the classes could be included directly using hiera_include with the hiera lookup hierachy or possibly an ENC or facts governing which roles a machine has. I have a setup where I've basically done that. I have changes I was already planning to propose for merging that move dependencies mostly into the hadoop module. That would render cluster.pp quite empty already. I also have a concept for assigning roles to nodes via hiera. This, however, is a bit convoluted and would need streamlining for inclusion in mainline BigTop. In the most basic case, classes such as hadoop::namenode can just be assigned to nodes directly such as this: manifests/site.pp: {noformat} hiera_include("classes") {noformat} hiera.yaml: {noformat} --- :yaml: :datadir: /etc/puppet/hieradata :hierarchy: - "node/%{::fqdn}" - site - bigtop/cluster {noformat} hieradata/node/node1.do.main.yaml: {noformat} --- classes: - hadoop::namenode - hadoop-zookeeper::server {noformat} > Introduce the concept of roles in bigtop cluster deployment > ----------------------------------------------------------- > > Key: BIGTOP-1746 > URL: https://issues.apache.org/jira/browse/BIGTOP-1746 > Project: Bigtop > Issue Type: New Feature > Components: deployment > Reporter: vishnu gajendran > Labels: features > Fix For: 0.9.0 > > > Currently, during cluster deployment, puppet categorizes nodes as head_node, worker_nodes, gateway_nodes, standy_node based on user specified info. This functionality gives user control over picking up a particular node as head_node, standy_node, gateway_node and rest others as worker_nodes. But, I woulld like to have more fine-grained control on which deamons should run on which node. For example, I do not want to run namenode, datanode on the same node. This functionality can be introduced with the concept of roles. Each node can be assigned a set of role. For example, Node A can be assigned ["namenode", "resourcemanager"] roles. Node B can be assigned ["datanode", "nodemanager"] and Node C can be assigned ["nodemanager", "hadoop-client"]. Now, each node will only run the specified daemons. Prerequisite for this kind of deployment is that each node should be given the necessary configurations that it needs to know. For example, each datanode should know which is the namenode etc... This functionality will allow users to customize the cluster deployment according to their needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)