An approach to look at is having a simple topology DSL say in yaml that the
end user interacts with to specify the logical set of nodes, what daemons
are on each node and how they are interconnected (for more complex things
like ha or indicating how monitors, user authentication ,etc could plug
in).. It would be easy then to write some simple code to "compile this into
for example" hiera.yaml files or as an ENC or even both.

I think this begs the question as to what type of end user is expected to
use this; if it is a Puppet savvy end user then having them specify things
in hiera would be clear, but if going after end users that are not well
versed in.Puppet having a Bigtop topology dsl" may resonate better and can
be simpler.

Now, I have done much work around this area so would be happy to propose a
starting point for a topology DSL if this approach makes sense. I can also
flesh out a number of issues that could be addressed to see what priority
people would give it.
One issue for example is whether the topology just logically identifies
nodes and groups of nodes (e.g., the set of slaves) and does not require ip
or dns addresses to be assigned to them; this allows more sharable designs
without locking users into what would be one team's deployment specific
settings. It also facilitates the process where given a toplogy we spin up
a set of nodes and in a late binding way attach the host addresses to the
nodes and to the attributes for connecting between hosts.

For the specific example of init-hdfs,sh if I am correctly guessing at what
issue would be is that ideally want to only create directories for services
that are being actually used and not create it for all directories or
equivalently for the data driven way in which this is created you want to
construct the description of directories to include as a function of what
deamons are on the topology nodes. This is something I have tackled and can
include this in a write up if interested,
- Rich



----- Original Message -----
From:
 dev@bigtop.apache.org

To:
<dev@bigtop.apache.org>
Cc:

Sent:
Tue, 10 Mar 2015 09:19:38 +0000 (UTC)
Subject:
[jira] [Commented] (BIGTOP-1746) Introduce the concept of roles in bigtop
cluster deployment



[
https://issues.apache.org/jira/browse/BIGTOP-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354586#comment-14354586
]

Michael Weiser commented on BIGTOP-1746:
----------------------------------------

At least in the Puppet manifests there are only two places where the
current role concept is implemented:
- manifests/cluster.pp decides what daemons to put on which box
- hieradata/bigtop/cluster.yaml governs what to put into their config files

cluster.yaml can easily be adjusted and overridden so it doesn't force the
concept of head node and frontend into the config files any more. So the
main point of attack from my point of view is cluster.pp. Unfortunately it
also implements some dependencies between modules and some add-on logic,
such as running init-hdfs.sh. Basically I would suggest moving these
dependencies into their respective modules and then just throwing away
cluster.pp. After that, the classes could be included directly using
hiera_include with the hiera lookup hierachy or possibly an ENC or facts
governing which roles a machine has.

I have a setup where I've basically done that. I have changes I was already
planning to propose for merging that move dependencies mostly into the
hadoop module. That would render cluster.pp quite empty already. I also
have a concept for assigning roles to nodes via hiera. This, however, is a
bit convoluted and would need streamlining for inclusion in mainline
BigTop. In the most basic case, classes such as hadoop::namenode can just
be assigned to nodes directly such as this:

manifests/site.pp:
{noformat}
hiera_include("classes")
{noformat}

hiera.yaml:
{noformat}
---
:yaml:
:datadir: /etc/puppet/hieradata
:hierarchy:
- "node/%{::fqdn}"
- site
- bigtop/cluster
{noformat}

hieradata/node/node1.do.main.yaml:
{noformat}
---
classes:
- hadoop::namenode
- hadoop-zookeeper::server
{noformat}

> Introduce the concept of roles in bigtop cluster deployment
> -----------------------------------------------------------
>
> Key: BIGTOP-1746
> URL: https://issues.apache.org/jira/browse/BIGTOP-1746
> Project: Bigtop
> Issue Type: New Feature
> Components: deployment
> Reporter: vishnu gajendran
> Labels: features
> Fix For: 0.9.0
>
>
> Currently, during cluster deployment, puppet categorizes nodes as
head_node, worker_nodes, gateway_nodes, standy_node based on user specified
info. This functionality gives user control over picking up a particular
node as head_node, standy_node, gateway_node and rest others as
worker_nodes. But, I woulld like to have more fine-grained control on which
deamons should run on which node. For example, I do not want to run
namenode, datanode on the same node. This functionality can be introduced
with the concept of roles. Each node can be assigned a set of role. For
example, Node A can be assigned ["namenode", "resourcemanager"] roles. Node
B can be assigned ["datanode", "nodemanager"] and Node C can be assigned
["nodemanager", "hadoop-client"]. Now, each node will only run the
specified daemons. Prerequisite for this kind of deployment is that each
node should be given the necessary configurations that it needs to know.
For example, each datanode should know which is the namenode etc... This
functionality will allow users to customize the cluster deployment
according to their needs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to