Re: Thoughts and opinions in physically building a cluster
That doesn't sound too bad (it's a fairly typical setup e.g. on an Amazon VPC). You probably want to avoid NAT or similar things between master and slaves to avoid a lot of LIBPROCESS_IP tricks so same switch sounds good. Personally I quite like the master/slave distinction. I wouldn't want a runaway set of tasks to bog down the masters and operationally we'd alert if we're starting to lose masters whereas the slaves are 'cattle' and we can just spin up more as they die if need be (it's a little more tricky to scale out masters and zookeepers so they get treated as though they were a bit less expendable). I co-locate the zookeeper ensemble on the masters on smaller clusters to save VM count, but that's more personal taste than anything. On 25 June 2015 at 17:12, Daniel Gaston daniel.gas...@dal.ca wrote: So this may be another relatively noob question, but when designing a mesos cluster, is it basically as simple as the nodes connected by a switch? Since any of the nodes can be master nodes or acting as both master and slave, I am guessing there is no need for another head node as you would have with a traditional cluster design. But would each of the nodes then have to be connected to the external/institutional network? My rough idea was for this small cluster to not be connected to the main institutional network but for my workstation to be connected to both the cluster's network as well as to the institutional network From: CCAAT cc...@tampabay.rr.com Sent: June-19-15 4:57 PM To: user@mesos.apache.org Cc: cc...@tampabay.rr.com Subject: Re: Thoughts and opinions in physically building a cluster On 06/19/2015 01:28 PM, Daniel Gaston wrote: On 19/06/2015 18:38, Oliver Nicholas wrote: Unless you have some true HA requirements, it seems intuitively wasteful to have 3 masters and 2 slaves (unless the cost of 5 nodes is inconsequential to you and you hate the environment). Any particular reason not to have three nodes which are acting both as master and slaves? None at all. I'm not a cluster or networking guru, and have only played with mesos in cloud-based settings so I wasn't sure how this would work. But it makes sense, that way the 'standby' masters are still participating in the zookeeper quorum while still being available to do real work as slave nodes. Daniel. There is no such thing as a 'cluster guru'. It's all 'seat of the pants' flying right now; so you are fine what you are doing and propose. If codes do not exist to meet your specific needs and goals, they can (should?) be created. I'm working on an architectural expansion Where nodes (virtual, actual or bare metal) migrate from master -- entrepreneur -- worker -- slave -- embedded (bare metal or specially attached hardware. I'm proposing to do all of this with the Autonomy_Function and decisions being made bottom_up as opposed to the current top_down dichotomy. I'm prolly going to have to 'fork codes' for a while to get things stable and then hope they are included; when other minds see the validity of the ideas. Surely one box can be set up as both master and slave. Moving slaves to masters, should be an automatic function and will prolly will be address in the future codes of mesos. PS: Keep pushing your ideas and do not take no for an answer! Mesos belongs to everybody. hth, James
Re: Thoughts and opinions in physically building a cluster
Thanks Oliver, a lot of great suggestions. One of the reasons I was interested in Mesos was the idea of it being more generalized. While this small HPC cluster will serve one primary job, it will also be used for research purposes. So being able to easily test out frameworks and not be 'locked in' to one way of doing things is appealing. Most jobs are relatively CPU/RAM heavy (and small file disk I/O unfortunately) but I already have a good handle on building individual compute servers that would handle that, so would be suitable slave nodes/compute clusters. HA would be nice in terms of ensuring turn-around times on workflows, but likely isn't a major issue in terms of if it is down for a few hours no one will lose any sleep or die. If the node could be brought back up reasonably it should be fine.
Re: Thoughts and opinions in physically building a cluster
On 19/06/2015 18:38, Oliver Nicholas wrote: Unless you have some true HA requirements, it seems intuitively wasteful to have 3 masters and 2 slaves (unless the cost of 5 nodes is inconsequential to you and you hate the environment). Any particular reason not to have three nodes which are acting both as master and slaves?
Re: Thoughts and opinions in physically building a cluster
On Fri, Jun 19, 2015 at 11:22 AM, Brian Candler b.cand...@pobox.com wrote: On 19/06/2015 18:38, Oliver Nicholas wrote: Unless you have some true HA requirements, it seems intuitively wasteful to have 3 masters and 2 slaves (unless the cost of 5 nodes is inconsequential to you and you hate the environment). Any particular reason not to have three nodes which are acting both as master and slaves? Certainly seems reasonable to me! -- *bigo* / oliver nicholas | staff engineer, infrastructure | uber technologies, inc.
Re: Thoughts and opinions in physically building a cluster
On Fri, Jun 19, 2015 at 10:03 AM, Daniel Gaston daniel.gas...@dal.ca wrote: Hi Everyone, I've looked through the archives and the web but still have some questions on this question. 1) If I was looking at building a small compute/HPC cluster is Mesos overkill in such a situation? Mesos isn't overkill, though there may be platforms developed more specifically for your use case (vs. Mesos which is extremely generalized). 2) What is the minimum number of physical nodes? It seems from documentation and examples ideally this is something like 5, with 3 masters and say two slaves. Technically speaking, you can do it all with one node. It just depends what properties you need. Having three masters (or any HA grouping, ie/ an odd number greater than 1) is overkill if high availability isn't a requirement - you can just have a single master node and live with the fact that if it goes down, you can't schedule any new tasks until you bring it back. Unless you have some true HA requirements, it seems intuitively wasteful to have 3 masters and 2 slaves (unless the cost of 5 nodes is inconsequential to you and you hate the environment). 3) What are some other good resources in terms of doing this? Appropriate specs for individual nodes, particularly where you would likely want slave/compute nodes to be much beefier than Master nodes. What other equipment would you need, just nodes and switches? Depends what your workloads look like. Mesos itself (both master and slave) is very thin - under most circumstances it won't even need a whole CPU core to itself. Remember, Mesos itself doesn't do any real work other than coordination - it's the processes you use it to schedule/run that are going to use up the physical resources. So the question you ask yourself in this situation is which primary resources does my workload use? Is it CPU heavy, memory heavy, maybe disk or network I/O heavy? That's how you decide what machines to throw at it. The question is more or less the same whether you use Mesos to schedule or not. Identifying resource requirements should be possible both by understanding what the process does, and by measuring it with standard unix tools. As for the second part of your question, you just need a set of computers that can run modern Linux and talk to each other over TCP/IP. You probably want them on a private network. 4) Would it make sense to have a smaller number of physical nodes split up into virtual nodes or will this just make everything much more complex? This is probably not necessary. Mesos has native support for process isolation via cgroups, which obviates one of the advantages of VMs. Structurally, the whole *point* of Mesos is to abstract away the concept of individual machines into pools of compute capacity, so you're kinda working at cross purposes if you go down this road too far. Any thoughts, opinions, or directions to resources is much appreciated! Cheers, Dan -- *bigo* / oliver nicholas | staff engineer, infrastructure | uber technologies, inc.
Re: Thoughts and opinions in physically building a cluster
Hi, exactly what I’ve been doing for a few smaller setups. I see this as the minimum ‘production’ setup. For testing or development I just run everything on a single node, sometimes even a VM. Regards, Eelco On 19 Jun 2015, at 20:23, Oliver Nicholas b...@uber.com wrote: On Fri, Jun 19, 2015 at 11:22 AM, Brian Candler b.cand...@pobox.com mailto:b.cand...@pobox.com wrote: On 19/06/2015 18:38, Oliver Nicholas wrote: Unless you have some true HA requirements, it seems intuitively wasteful to have 3 masters and 2 slaves (unless the cost of 5 nodes is inconsequential to you and you hate the environment). Any particular reason not to have three nodes which are acting both as master and slaves? Certainly seems reasonable to me! -- bigo / oliver nicholas | staff engineer, infrastructure | uber technologies, inc. smime.p7s Description: S/MIME cryptographic signature
Re: Thoughts and opinions in physically building a cluster
Thanks for all of these comments - I had similar questions. What is the minimum RAM for a master or a slave? I have heard that the Mesos slave software adds 1GB of RAM on top of what the slave's workload processing will require. I have read that 8GB is the min for a Mesos machine but it wasn't clear that this was an official/hard requirement. On Fri, Jun 19, 2015 at 11:31 AM, Eelco Maljaars | Maljaars IT ee...@maljaars-it.nl wrote: Hi, exactly what I’ve been doing for a few smaller setups. I see this as the minimum ‘production’ setup. For testing or development I just run everything on a single node, sometimes even a VM. Regards, Eelco On 19 Jun 2015, at 20:23, Oliver Nicholas b...@uber.com wrote: On Fri, Jun 19, 2015 at 11:22 AM, Brian Candler b.cand...@pobox.com wrote: On 19/06/2015 18:38, Oliver Nicholas wrote: Unless you have some true HA requirements, it seems intuitively wasteful to have 3 masters and 2 slaves (unless the cost of 5 nodes is inconsequential to you and you hate the environment). Any particular reason not to have three nodes which are acting both as master and slaves? Certainly seems reasonable to me! -- *bigo* / oliver nicholas | staff engineer, infrastructure | uber technologies, inc.
Re: Thoughts and opinions in physically building a cluster
On 06/19/2015 01:45 PM, Dave Martens wrote: Thanks for all of these comments - I had similar questions. What is the minimum RAM for a master or a slave? I have heard that the Mesos slave software adds 1GB of RAM on top of what the slave's workload processing will require. I have read that 8GB is the min for a Mesos machine but it wasn't clear that this was an official/hard requirement. There are probably published/standard numbers for the various distros that the slave node is build upon (actual or virtual). Actually with a robust (CI) infrastructure, these sort of resource metrics and the various benchmarks should be revealed to the user community, routinely. I'm not certain on what, if any, of this sort of data is being published. If you tune (stip) the Operating System or the virtual image or the kernel, then these numbers are most likely lower. I'm not sure much has been published on tuning the OS, kernels or installs for mesos. HPC offerings will surely be pushing the envelop of these and many more related metrics for performance tuning to specialized classes of problem, hardware specifics and other goals of performance tuning. Most will run on bloat_ware but the smarter datacenters and HPC folks will 'cut the pork' for those single digit performance gains. YMMV. hth, James