Re: [openstack-dev] [TripleO] Tuskar CLI after architecture changes
On 12/20/2013 08:40 AM, Ladislav Smola wrote: On 12/20/2013 02:06 PM, Radomir Dopieralski wrote: On 20/12/13 13:04, Radomir Dopieralski wrote: [snip] I have just learned that tuskar-api stays, so my whole ranting is just a waste of all our time. Sorry about that. Hehe. :-) Ok after the last meeting we are ready to say what goes to Tuskar-API. Who wants to start that thread? :-) I'm writing something up, but I won't have anything worth showing until after the New Year (sounds so far away when I say it that way; it's simply that I'm on vacation starting today until the 6th). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] UI Wireframes for Resource Management - ready for implementation
On 12/13/2013 01:53 PM, Tzu-Mainn Chen wrote: On 2013/13/12 11:20, Tzu-Mainn Chen wrote: These look good! Quick question - can you explain the purpose of Node Tags? Are they an additional way to filter nodes through nova-scheduler (is that even possible?), or are they there solely for display in the UI? Mainn We start easy, so that's solely for UI needs of filtering and monitoring (grouping of nodes). It is already in Ironic, so there is no reason why not to take advantage of it. -- Jarda Okay, great. Just for further clarification, are you expecting this UI filtering to be present in release 0? I don't think Ironic natively supports filtering by node tag, so that would be further work that would have to be done. Mainn I might be getting ahead of things, but will the tags be free-form entered by the user, pre-entered in a separate settings and selectable at node register/update time, or locked into a select few that we specify? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
* ability to 'preview' changes going to the scheduler What does this give you? How detailed a preview do you need? What information is critical there? Have you seen the proposed designs for a heat template preview feature - would that be sufficient? Will will probably have a better answer to this, but I feel like at very least this goes back to the psychology point raised earlier (I think in this thread, but if not, definitely one of the TripleO ones). A weird parallel is whenever I do a new install of Fedora. I never accept their default disk partitioning without electing to review/modify it. Even if I didn't expect to change anything, I want to see what they are going to give me. And then I compulsively review the summary of what actual changes will be applied in the follow up screen that's displayed after I say I'm happy with the layout. Perhaps that's more a commentary on my own OCD and cynicism that I feel dirty accepting the magic defaults blindly. I love the idea of anaconda doing the heavy lifting of figuring out sane defaults for home/root/swap and so on (similarly, I love the idea of Nova scheduler rationing out where instances are deployed), but I at least want to know I've seen it before it happens. I fully admit to not knowing how common that sort of thing is. I suspect I'm in the majority of geeks and tame by sys admin standards, but I honestly don't know. So I acknowledge that my entire argument for the preview here is based on my own personality. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 12/12/2013 04:25 PM, Keith Basil wrote: On Dec 12, 2013, at 4:05 PM, Jay Dobies wrote: Maybe this is a valid use case? Cloud operator has several core service nodes of differing configuration types. [node1] <-- balanced mix of disk/cpu/ram for general core services [node2] <-- lots of disks for Ceilometer data storage [node3] <-- low-end "appliance like" box for a specialized/custom core service (SIEM box for example) All nodes[1,2,3] are in the same deployment grouping ("core services)". As such, this is a heterogenous deployment grouping. Heterogeneity in this case defined by differing roles and hardware configurations. This is a real use case. How do we handle this? This is the sort of thing I had been concerned with, but I think this is just a variation on Robert's GPU example. Rather than butcher it by paraphrasing, I'll just include the relevant part: "The basic stuff we're talking about so far is just about saying each role can run on some set of undercloud flavors. If that new bit of kit has the same coarse metadata as other kit, Nova can't tell it apart. So the way to solve the problem is: - a) teach Ironic about the specialness of the node (e.g. a tag 'GPU') - b) teach Nova that there is a flavor that maps to the presence of that specialness, and c) teach Nova that other flavors may not map to that specialness then in Tuskar whatever Nova configuration is needed to use that GPU is a special role ('GPU compute' for instance) and only that role would be given that flavor to use. That special config is probably being in a host aggregate, with an overcloud flavor that specifies that aggregate, which means at the TripleO level we need to put the aggregate in the config metadata for that role, and the admin does a one-time setup in the Nova Horizon UI to configure their GPU compute flavor." Yes, the core services example is a variation on the above. The idea of _undercloud_ flavor assignment (flavor to role mapping) escaped me when I read that earlier. It appears to be very elegant and provides another attribute for Tuskar's notion of resource classes. So +1 here. You mention three specific nodes, but what you're describing is more likely three concepts: - Balanced Nodes - High Disk I/O Nodes - Low-End Appliance Nodes They may have one node in each, but I think your example of three nodes is potentially *too* simplified to be considered as proper sample size. I'd guess there are more than three in play commonly, in which case the concepts breakdown starts to be more appealing. Correct - definitely more than three, I just wanted to illustrate the use case. I not sure I explained what I was getting at properly. I wasn't implying you thought it was limited to just three. I do the same thing, simplify down for discussion purposes (I've done so in my head about this very topic). But I think this may be a rare case where simplifying actually masks the concept rather than exposes it. Manual feels a bit more desirable in small sample groups but when looking at larger sets of nodes, the flavor concept feels less odd than it does when defining a flavor for a single machine. That's all. :) Maybe that was clear already, but I wanted to make sure I didn't come off as attacking your example. It certainly wasn't my intention. The balanced v. disk machine thing is the sort of thing I'd been thinking for a while but hadn't found a good way to make concrete. I think the disk flavor in particular has quite a few use cases, especially until SSDs are ubiquitous. I'd want to flag those (in Jay terminology, "the disk hotness") as hosting the data-intensive portions, but where I had previously been viewing that as manual allocation, it sounds like the approach is to properly categorize them for what they are and teach Nova how to use them. Robert - Please correct me if I misread any of what your intention was, I don't want to drive people down the wrong path if I'm misinterpretting anything. -k ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Maybe this is a valid use case? Cloud operator has several core service nodes of differing configuration types. [node1] <-- balanced mix of disk/cpu/ram for general core services [node2] <-- lots of disks for Ceilometer data storage [node3] <-- low-end "appliance like" box for a specialized/custom core service (SIEM box for example) All nodes[1,2,3] are in the same deployment grouping ("core services)". As such, this is a heterogenous deployment grouping. Heterogeneity in this case defined by differing roles and hardware configurations. This is a real use case. How do we handle this? This is the sort of thing I had been concerned with, but I think this is just a variation on Robert's GPU example. Rather than butcher it by paraphrasing, I'll just include the relevant part: "The basic stuff we're talking about so far is just about saying each role can run on some set of undercloud flavors. If that new bit of kit has the same coarse metadata as other kit, Nova can't tell it apart. So the way to solve the problem is: - a) teach Ironic about the specialness of the node (e.g. a tag 'GPU') - b) teach Nova that there is a flavor that maps to the presence of that specialness, and c) teach Nova that other flavors may not map to that specialness then in Tuskar whatever Nova configuration is needed to use that GPU is a special role ('GPU compute' for instance) and only that role would be given that flavor to use. That special config is probably being in a host aggregate, with an overcloud flavor that specifies that aggregate, which means at the TripleO level we need to put the aggregate in the config metadata for that role, and the admin does a one-time setup in the Nova Horizon UI to configure their GPU compute flavor." You mention three specific nodes, but what you're describing is more likely three concepts: - Balanced Nodes - High Disk I/O Nodes - Low-End Appliance Nodes They may have one node in each, but I think your example of three nodes is potentially *too* simplified to be considered as proper sample size. I'd guess there are more than three in play commonly, in which case the concepts breakdown starts to be more appealing. I think the disk flavor in particular has quite a few use cases, especially until SSDs are ubiquitous. I'd want to flag those (in Jay terminology, "the disk hotness") as hosting the data-intensive portions, but where I had previously been viewing that as manual allocation, it sounds like the approach is to properly categorize them for what they are and teach Nova how to use them. Robert - Please correct me if I misread any of what your intention was, I don't want to drive people down the wrong path if I'm misinterpretting anything. -k ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Terminology
So glad we're hashing this out now. This will save a bunch of headaches in the future. Good call pushing this forward. On 12/11/2013 02:15 PM, Tzu-Mainn Chen wrote: Hi, I'm trying to clarify the terminology being used for Tuskar, which may be helpful so that we're sure that we're all talking about the same thing :) I'm copying responses from the requirements thread and combining them with current requirements to try and create a unified view. Hopefully, we can come to a reasonably rapid consensus on any desired changes; once that's done, the requirements can be updated. * NODE a physical general purpose machine capable of running in many roles. Some nodes may have hardware layout that is particularly useful for a given role. Do we ever need to distinguish between undercloud and overcloud nodes? * REGISTRATION - the act of creating a node in Ironic DISCOVERY - The act of having nodes found auto-magically and added to Ironic with minimal user intervention. * ROLE - a specific workload we want to map onto one or more nodes. Examples include 'undercloud control plane', 'overcloud control plane', 'overcloud storage', 'overcloud compute' etc. * MANAGEMENT NODE - a node that has been mapped with an undercloud role * SERVICE NODE - a node that has been mapped with an overcloud role * COMPUTE NODE - a service node that has been mapped to an overcloud compute role * CONTROLLER NODE - a service node that has been mapped to an overcloud controller role * OBJECT STORAGE NODE - a service node that has been mapped to an overcloud object storage role * BLOCK STORAGE NODE - a service node that has been mapped to an overcloud block storage role * UNDEPLOYED NODE - a node that has not been mapped with a role * another option - UNALLOCATED NODE - a node that has not been allocated through nova scheduler (?) - (after reading lifeless's explanation, I agree that "allocation" may be a misleading term under TripleO, so I personally vote for UNDEPLOYED) Undeployed still sounds a bit odd to me when paired with the word role. I could see deploying a workload "bundle" or something, but a role doesn't feel like a tangible thing that is pushed out somewhere. Unassigned? As in, it hasn't been assigned a role yet. * INSTANCE - A role deployed on a node - this is where work actually happens. I'm fine with "instance", but the the phrasing "a role deployed on a node" feels odd to me in the same way "undeployed" does. Maybe a slight change to "A node that has been assigned a role", but that also may be me being entirely too nit-picky. To put it in context, on a scale of 1-10, my objection to this and "undeployed" is around a 2, so don't let me come off as strenuously objecting. * DEPLOYMENT * SIZE THE ROLES - the act of deciding how many nodes will need to be assigned to each role * another option - DISTRIBUTE NODES (?) - (I think the former is more accurate, but perhaps there's a better way to say it?) * SCHEDULING - the process of deciding which role is deployed on which node I know this derives from a Nova term, but to me, the idea of "scheduling" carries a time-in-the-future connotation to it. The interesting part of what goes on here is the assignment of which roles go to which instances. * SERVICE CLASS - a further categorization within a service role for a particular deployment. I don't understand this one, can you add a few examples? * NODE PROFILE - a set of requirements that specify what attributes a node must have in order to be mapped to a service class Even without knowing what "service class" is, I like this one. :) Does this seem accurate? All feedback is appreciated! Mainn Thanks again :D ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Tuskar CLI after architecture changes
> I will take it little side ways. I think we should be asking why have > we needed the tuskar-api. It has done some more complex logic (e.g. > > building a heat template) or storing additional info, not supported > > by the services we use (like rack associations). > That is a perfectly fine use-case of introducing tuskar-api. > Although now, when everything is shifting to the services themselves, > we don't need tuskar-api for that kind of stuff. Can you please list > what complex operations are left, that should be done in tuskar? I > > think discussing concrete stuff would be best. This is a good call to circle back on that I'm not sure of it either. The wireframes I've seen so far largely revolve around node listing and allocation, but I 100% know I'm oversimplifying it and missing something bigger there. Also, as I have been talking with rdopieralsky, there has been some problems in the past, with tuskar doing more steps in one. Like create a rack and register new nodes in the same time. As those have been separate API calls and there is no transaction handling, we should not do this kind of things in the first place. If we have actions that depends on each other, it should go from UI one by one. Otherwise we will be showing messages like, "The rack has not been created, but 5 from 8 nodes has been added. We have tried to delete those added nodes, but 2 of the 5 deletions has failed. Please figure this out, then you can run this awesome action that calls multiple dependent APIs without real rollback again." (or something like that, depending on what gets created first) This is what I expected to see as the primary argument against it, the lack of a good transactional model for calling the dependent APIs. And it's certainly valid. But what you're describing is the exact same problem regardless if you go from the UI or from the Tuskar API. If we're going to do any sort of higher level automation of things for the user that spans APIs, we're going to run into it. The question is if the client(s) handle it or the API. The alternative is to not have the awesome action in the first place, in which case we're not really giving the user as much value as an application. I am not saying we should not have tuskar-api. Just put there things that belongs there, not proxy everything. > btw. the real path of the diagram is tuskar-ui <-> tuskarclient <-> tuskar-api <-> heatclient <-> heat-api .|ironic|etc. My conclusion -- I say if it can be tuskar-ui <-> heatclient <-> heat-api, lets keep it that way. I'm still fuzzy on what OpenStack means when it says *client. Is that just a bindings library that invokes a remote API or does it also contain the CLI bits? If we realize we are putting some business logic to UI, that needs to be done also to CLI, or we need to store some additional data, that doesn't belong anywhere let's put it in Tuskar-API. Kind Regards, Ladislav Thanks for the feedback :) On 12/11/2013 03:32 PM, Jay Dobies wrote: Disclaimer: I swear I'll stop posting this sort of thing soon, but I'm new to the project. I only mention it again because it's relevant in that I missed any of the discussion on why proxying from tuskar API to other APIs is looked down upon. Jiri and I had been talking yesterday and he mentioned it to me when I started to ask these same sorts of questions. On 12/11/2013 07:33 AM, Jiří Stránský wrote: Hi all, TL;DR: I believe that "As an infrastructure administrator, Anna wants a CLI for managing the deployment providing the same fundamental features as UI." With the planned architecture changes (making tuskar-api thinner and getting rid of proxying to other services), there's not an obvious way to achieve that. We need to figure this out. I present a few options and look forward for feedback. Previously, we had planned Tuskar arcitecture like this: tuskar-ui <-> tuskarclient <-> tuskar-api <-> heat-api|ironic-api|etc. My biggest concern was that having each client call out to the individual APIs directly put a lot of knowledge into the clients that had to be replicated across clients. At the best case, that's simply knowing where to look for data. But I suspect it's bigger than that and there are workflows that will be implemented for tuskar needs. If the tuskar API can't call out to other APIs, that workflow implementation needs to be done at a higher layer, which means in each client. Something I'm going to talk about later in this e-mail but I'll mention here so that the diagrams sit side-by-side is the potential for a facade layer that hides away the multiple APIs. Lemme see if I can do this in ASCII: tuskar-ui -+ +-tuskar-api | | +-clie
Re: [openstack-dev] [TripleO] Tuskar CLI after architecture changes
Disclaimer: I swear I'll stop posting this sort of thing soon, but I'm new to the project. I only mention it again because it's relevant in that I missed any of the discussion on why proxying from tuskar API to other APIs is looked down upon. Jiri and I had been talking yesterday and he mentioned it to me when I started to ask these same sorts of questions. On 12/11/2013 07:33 AM, Jiří Stránský wrote: Hi all, TL;DR: I believe that "As an infrastructure administrator, Anna wants a CLI for managing the deployment providing the same fundamental features as UI." With the planned architecture changes (making tuskar-api thinner and getting rid of proxying to other services), there's not an obvious way to achieve that. We need to figure this out. I present a few options and look forward for feedback. Previously, we had planned Tuskar arcitecture like this: tuskar-ui <-> tuskarclient <-> tuskar-api <-> heat-api|ironic-api|etc. My biggest concern was that having each client call out to the individual APIs directly put a lot of knowledge into the clients that had to be replicated across clients. At the best case, that's simply knowing where to look for data. But I suspect it's bigger than that and there are workflows that will be implemented for tuskar needs. If the tuskar API can't call out to other APIs, that workflow implementation needs to be done at a higher layer, which means in each client. Something I'm going to talk about later in this e-mail but I'll mention here so that the diagrams sit side-by-side is the potential for a facade layer that hides away the multiple APIs. Lemme see if I can do this in ASCII: tuskar-ui -+ +-tuskar-api | | +-client-facade-+-nova-api | | tuskar-cli-+ +-heat-api The facade layer runs client-side and contains the business logic that calls across APIs and adds in the tuskar magic. That keeps the tuskar API from calling into other APIs* but keeps all of the API call logic abstracted away from the UX pieces. * Again, I'm not 100% up to speed with the API discussion, so I'm going off the assumption that we want to avoid API to API calls. If that isn't as strict of a design principle as I'm understanding it to be, then the above picture probably looks kinda silly, so keep in mind the context I'm going from. For completeness, my gut reaction was expecting to see something like: tuskar-ui -+ | +-tuskar-api-+-nova-api || tuskar-cli-++-heat-api Where a tuskar client talked to the tuskar API to do tuskar things. Whatever was needed to do anything tuskar-y was hidden away behind the tuskar API. This meant that the "integration logic" of how to use heat, ironic and other services to manage an OpenStack deployment lied within *tuskar-api*. This gave us an easy way towards having a CLI - just build tuskarclient to wrap abilities of tuskar-api. Nowadays we talk about using heat and ironic (and neutron? nova? ceilometer?) apis directly from the UI, similarly as Dashboard does. But our approach cannot be exactly the same as in Dashboard's case. Dashboard is quite a thin wrapper on top of python-...clients, which means there's a natural parity between what the Dashboard and the CLIs can do. When you say python- clients, is there a distinction between the CLI and a bindings library that invokes the server-side APIs? In other words, the CLI is packaged as CLI+bindings and the UI as GUI+bindings? We're not wrapping the APIs directly (if wrapping them directly would be sufficient, we could just use Dashboard and not build Tuskar API at all). We're building a separate UI because we need *additional logic* on top of the APIs. E.g. instead of directly working with Heat templates and Heat stacks to deploy overcloud, user will get to pick how many control/compute/etc. nodes he wants to have, and we'll take care of Heat things behind the scenes. This makes Tuskar UI significantly thicker than Dashboard is, and the natural parity between CLI and UI vanishes. By having this logic in UI, we're effectively preventing its use from CLI. (If i were bold i'd also think about integrating Tuskar with other software which would be prevented too if we keep the business logic in UI, but i'm not absolutely positive about use cases here). I see your point about preventing its use from the CLI, but more disconcerting IMO is that it just doesn't belong in the UI. That sort of logic, the "Heat things behind the scenes", sounds like the jurisdiction of the API (if I'm reading into what that entails correctly). Now this raises a question - how do we get CLI reasonably on par with abilities of the UI? (Or am i wrong that Anna the infrastructure administrator would want that?) To reiterate my point above, I see the idea of getting the CLI on par, but I also see it as striving for a cleaner design as well. Here are some
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Thanks for the explanation! I'm going to claim that the thread revolves around two main areas of disagreement. Then I'm going to propose a way through: a) Manual Node Assignment I think that everyone is agreed that automated node assignment through nova-scheduler is by far the most ideal case; there's no disagreement there. The disagreement comes from whether we need manual node assignment or not. I would argue that we need to step back and take a look at the real use case: heterogeneous nodes. If there are literally no characteristics that differentiate nodes A and B, then why do we care which gets used for what? Why do we need to manually assign one? This is a better way of verbalizing my concerns. I suspect there are going to be quite a few heterogeneous environments built from legacy pieces in the near term and fewer built from the ground up with all new matching hotness. On the other side of it, instead of handling legacy hardware I was worried about the new hotness (not sure why I keep using that term) specialized for a purpose. This is exactly what Robert described in his GPU example. I think his explanation of how to use the scheduler to accommodate that makes a lot of sense, so I'm much less behind the idea of a strict manual assignment than I previously was. If we can agree on that, then I think it would be sufficient to say that we want a mechanism to allow UI users to deal with heterogeneous nodes, and that mechanism must use nova-scheduler. In my mind, that's what resource classes and node profiles are intended for. One possible objection might be: nova scheduler doesn't have the appropriate filter that we need to separate out two nodes. In that case, I would say that needs to be taken up with nova developers. b) Terminology It feels a bit like some of the disagreement come from people using different words for the same thing. For example, the wireframes already details a UI where Robert's roles come first, but I think that message was confused because I mentioned "node types" in the requirements. So could we come to some agreement on what the most exact terminology would be? I've listed some examples below, but I'm sure there are more. node type | role management node | ? resource node | ? unallocated | available | undeployed create a node distribution | size the deployment resource classes | ? node profiles | ? Mainn - Original Message - On 10 December 2013 09:55, Tzu-Mainn Chen wrote: * created as part of undercloud install process By that note I meant, that Nodes are not resources, Resource instances run on Nodes. Nodes are the generic pool of hardware we can deploy things onto. I don't think "resource nodes" is intended to imply that nodes are resources; rather, it's supposed to indicate that it's a node where a resource instance runs. It's supposed to separate it from "management node" and "unallocated node". So the question is are we looking at /nodes/ that have a /current role/, or are we looking at /roles/ that have some /current nodes/. My contention is that the role is the interesting thing, and the nodes is the incidental thing. That is, as a sysadmin, my hierarchy of concerns is something like: A: are all services running B: are any of them in a degraded state where I need to take prompt action to prevent a service outage [might mean many things: - software update/disk space criticals/a machine failed and we need to scale the cluster back up/too much load] C: are there any planned changes I need to make [new software deploy, feature request from user, replacing a faulty machine] D: are there long term issues sneaking up on me [capacity planning, machine obsolescence] If we take /nodes/ as the interesting thing, and what they are doing right now as the incidental thing, it's much harder to map that onto the sysadmin concerns. If we start with /roles/ then can answer: A: by showing the list of roles and the summary stats (how many machines, service status aggregate), role level alerts (e.g. nova-api is not responding) B: by showing the list of roles and more detailed stats (overall load, response times of services, tickets against services and a list of in trouble instances in each role - instances with alerts against them - low disk, overload, failed service, early-detection alerts from hardware C: probably out of our remit for now in the general case, but we need to enable some things here like replacing faulty machines D: by looking at trend graphs for roles (not machines), but also by looking at the hardware in aggregate - breakdown by age of machines, summary data for tickets filed against instances that were deployed to a particular machine C: and D: are (F) category work, but for all but the very last thing, it seems clear how to approach this from a roles perspective. I've tried to approach this using /nodes/ as the starting point, and after two terrible drafts I've deleted th
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
So the question is are we looking at /nodes/ that have a /current role/, or are we looking at /roles/ that have some /current nodes/. My contention is that the role is the interesting thing, and the nodes is the incidental thing. That is, as a sysadmin, my hierarchy of concerns is something like: A: are all services running B: are any of them in a degraded state where I need to take prompt action to prevent a service outage [might mean many things: - software update/disk space criticals/a machine failed and we need to scale the cluster back up/too much load] C: are there any planned changes I need to make [new software deploy, feature request from user, replacing a faulty machine] D: are there long term issues sneaking up on me [capacity planning, machine obsolescence] If we take /nodes/ as the interesting thing, and what they are doing right now as the incidental thing, it's much harder to map that onto the sysadmin concerns. If we start with /roles/ then can answer: A: by showing the list of roles and the summary stats (how many machines, service status aggregate), role level alerts (e.g. nova-api is not responding) B: by showing the list of roles and more detailed stats (overall load, response times of services, tickets against services and a list of in trouble instances in each role - instances with alerts against them - low disk, overload, failed service, early-detection alerts from hardware C: probably out of our remit for now in the general case, but we need to enable some things here like replacing faulty machines D: by looking at trend graphs for roles (not machines), but also by looking at the hardware in aggregate - breakdown by age of machines, summary data for tickets filed against instances that were deployed to a particular machine C: and D: are (F) category work, but for all but the very last thing, it seems clear how to approach this from a roles perspective. I've tried to approach this using /nodes/ as the starting point, and after two terrible drafts I've deleted the section. I'd love it if someone could show me how it would work:) * Unallocated nodes This implies an 'allocation' step, that we don't have - how about 'Idle nodes' or something. It can be auto-allocation. I don't see problem with 'unallocated' term. Ok, it's not a biggy. I do think it will frame things poorly and lead to an expectation about how TripleO works that doesn't match how it does, but we can change it later if I'm right, and if I'm wrong, well it won't be the first time :). I'm interested in what the distinction you're making here is. I'd rather get things defined correctly the first time, and it's very possible that I'm missing a fundamental definition here. So we have: - node - a physical general purpose machine capable of running in many roles. Some nodes may have hardware layout that is particularly useful for a given role. - role - a specific workload we want to map onto one or more nodes. Examples include 'undercloud control plane', 'overcloud control plane', 'overcloud storage', 'overcloud compute' etc. - instance - A role deployed on a node - this is where work actually happens. - scheduling - the process of deciding which role is deployed on which node. This glossary is really handy to make sure we're all speaking the same language. The way TripleO works is that we defined a Heat template that lays out policy: 5 instances of 'overcloud control plane please', '20 hypervisors' etc. Heat passes that to Nova, which pulls the image for the role out of Glance, picks a node, and deploys the image to the node. Note in particular the order: Heat -> Nova -> Scheduler -> Node chosen. The user action is not 'allocate a Node to 'overcloud control plane', it is 'size the control plane through heat'. So when we talk about 'unallocated Nodes', the implication is that users 'allocate Nodes', but they don't: they size roles, and after doing all that there may be some Nodes that are - yes - unallocated, I'm not sure if I should ask this here or to your point above, but what about multi-role nodes? Is there any piece in here that says "The policy wants 5 instances but I can fit two of them on this existing underutilized node and three of them on unallocated nodes" or since it's all at the image level you get just what's in the image and that's the finest-level of granularity? or have nothing scheduled to them. So... I'm not debating that we should have a list of free hardware - we totally should - I'm debating how we frame it. 'Available Nodes' or 'Undeployed machines' or whatever. I just want to get away from talking about something ([manual] allocation) that we don't offer. My only concern here is that we're not talking about cloud users, we're talking about admins adminning (we'll pretend it's a word, come with me) a cloud. To a cloud user, "give me some power so I can do some stuff" is a safe use case if I trust the cloud I'm running on. I t
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 12/06/2013 09:39 PM, Tzu-Mainn Chen wrote: Thanks for the comments and questions! I fully expect that this list of requirements will need to be fleshed out, refined, and heavily modified, so the more the merrier. Comments inline: *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES Note that everything in this section should be Ironic API calls. * Creation * Manual registration * hardware specs from Ironic based on mac address (M) Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory stats * IP auto populated from Neutron (F) Do you mean IPMI IP ? I'd say IPMI address managed by Neutron here. * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) Why is this under 'nodes'? I challenge the idea that it should be there. We will need to surface some stuff about nodes, but the underlying idea is to take a cloud approach here - so we're monitoring services, that happen to be on nodes. There is room to monitor nodes, as an undercloud feature set, but lets be very very specific about what is sitting at what layer. That's a fair point. At the same time, the UI does want to monitor both services and the nodes that the services are running on, correct? I would think that a user would want this. Would it be better to explicitly split this up into two separate requirements? That was my understanding as well, that Tuskar would not only care about the services of the undercloud but the health of the actual hardware on which it's running. As I write that I think you're correct, two separate requirements feels much more explicit in how that's different from elsewhere in OpenStack. * Management node (where triple-o is installed) This should be plural :) - TripleO isn't a single service to be installed - We've got Tuskar, Ironic, Nova, Glance, Keystone, Neutron, etc. I misspoke here - this should be "where the undercloud is installed". My current understanding is that our initial release will only support the undercloud being installed onto a single node, but my understanding could very well be flawed. * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes ^ nodes is again confusing layers - nodes are what things are deployed to, but they aren't the entry point * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types Not by users though. We need to stop thinking of this as 'what we do to nodes' - Nova/Ironic operate on nodes, we operate on Heat templates. Right, I didn't mean to imply that users would be doing this allocation. But once Nova does this allocation, the UI does want to be aware of how the allocation is done, right? That's what this requirement meant. * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes per node type (M) Whats a node type? Compute/controller/object storage/block storage. Is another term besides "node type" more accurate? * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) I'm not clear on this - you can list the nodes that have had a particular thing deployed on them; we probably can get a good answer to being able to see what nodes a particular flavor can deploy to, but we don't want to be second guessing the scheduler.. Correct; the goal here is to provide a way through the UI to send additional filtering requirements that will eventually be passed into the scheduler, allowing the scheduler to apply additional filters. * nodes can be viewed by node types * additional group by status, hardware specification *Instances* - e.g. hypervisors, storage, block storage etc. * controller node type Again, need to get away from node type here. * each controller node will run all openstack services * allow each node to run specified service (F) * breakdown by workload (percentage of cpu used per node) (M) * Unallocated nodes This implies an 'allocation' step, that we don't have - how about 'Idle nodes' or something. Is it imprecise to say that nodes are allocated by the scheduler? Would something like 'active/idle' be better? * Archived nodes (F) * Will be se
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
I believe we are still 'fighting' here with two approaches and I believe we need both. We can't only provide a way 'give us resources we will do a magic'. Yes this is preferred way - especially for large deployments, but we also need a fallback so that user can say - no, this node doesn't belong to the class, I don't want it there - unassign. Or I need to have this node there - assign. +1 to this. I think there are still a significant amount of admins out there that are really opposed to magic and want that fine-grained control. Even if they don't use it that frequently, in my experience they want to know it's there in the event they need it (and will often dream up a case that they'll need it). I'm absolutely for pushing the magic approach as the preferred use. And in large deployments that's where people are going to see the biggest gain. The fine-grained approach can even be pushed off as a future feature. But I wouldn't be surprised to see people asking for it and I'd like to at least be able to say it's been talked about. - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. Why? This is super generic and could mean anything. I believe this has something to do with 'archived nodes'. But correct me if I am wrong. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Disclaimer: I'm very new to the project, so apologies if some of my questions have been already answered or flat out don't make sense. As I proofread, some of my comments may drift a bit past basic requirements, so feel free to tell me to take certain questions out of this thread into specific discussion threads if I'm getting too detailed. *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES * Creation * Manual registration * hardware specs from Ironic based on mac address (M) * IP auto populated from Neutron (F) * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) * Management node (where triple-o is installed) * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types It's pretty clear by the current verbiage but I'm going to ask anyway: "one and only one"? * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes per node type (M) My gut reaction is that we want to bite this off sooner rather than later. This will have data model and API implications that, even if we don't commit to it for Icehouse, should still be in our minds during it, so it might make sense to make it a first class thing to just nail down now. * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) To my understanding, once this is in Icehouse, we'll have to support upgrades. If this filtering is pushed off, could we get into a situation where an allocation created in Icehouse would no longer be valid in Icehouse+1 once these filters are in place? If so, we might want to make it more of a priority to get them in place earlier and not eat the headache of addressing these sorts of integrity issues later. * nodes can be viewed by node types * additional group by status, hardware specification * controller node type * each controller node will run all openstack services * allow each node to run specified service (F) * breakdown by workload (percentage of cpu used per node) (M) * Unallocated nodes Is there more still being flushed out here? Things like: * Listing unallocated nodes * Unallocating a previously allocated node (does this make it a vanilla resource or does it retain the resource type? is this the only way to change a node's resource type?) * Unregistering nodes from Tuskar's inventory (I put this under unallocated under the assumption that the workflow will be an explicit unallocate before unregister; I'm not sure if this is the same as "archive" below). * Archived nodes (F) Can you elaborate a bit more on what this is? * Will be separate openstack service (F) * DEPLOYMENT * multiple deployments allowed (F) * initially just one * deployment specifies a node distribution across node types * node distribution can be updated after creation * deployment configuration, used for initial creation only * defaulted, with no option to change * allow modification (F) * review distribution map (F) * notification when a deployment is ready to go or whenever something changes * DEPLOYMENT ACTION * Heat template generated on the fly * hardcoded images * allow image selection (F) * pre-created template fragments for each node type * node type distribution affects generated template * nova scheduler allocates nodes * filters based on resource class and node profile information (M) * Deployment action can create or update * status indicator to determine overall state of deployment * status indicator for nodes as well * status includes 'time left' (F) * NETWORKS (F) * IMAGES (F) * LOGS (F) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Tripleo] Core reviewer update Dec
On 12/06/2013 12:26 PM, Clint Byrum wrote: Excerpts from Robert Collins's message of 2013-12-03 23:12:39 -0800: Hi, like most OpenStack projects we need to keep the core team up to date: folk who are not regularly reviewing will lose context over time, and new folk who have been reviewing regularly should be trusted with -core responsibilities. In this months review: - Ghe Rivero for -core +1, We've been getting good reviews from Ghe for a while now. :) - Jan Provaznik for removal from -core - Jordan O'Mara for removal from -core - Martyn Taylor for removal from -core - Jiri Tomasek for removal from -core - Jamomir Coufal for removal from -core I suggest we delay this removal for 30 days. For what it's worth, keep in mind the holidays coming up at the end of December. I suspect that trying to reevaluate 30 days from now will be even trickier when you have to take into account vacation times. I know it is easy to add them back in, but I hesitate to disrupt the flow if these people all are willing to pick up the pace again. They may not have _immediate_ code knowledge but they should have enough historical knowledge that has not gone completely stale in just the last 30-60 days. What I'm suggesting is that review velocity will benefit from core being a little more sticky, especially for sustained contributors who have just had their attention directed elsewhere briefly. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Questions around Development Process
a) Because we're essentially doing a tear-down and re-build of the whole architecture (a lot of the concepts in tuskar will simply disappear), it's difficult to do small incremental patches that support existing functionality. Is it okay to have patches that break functionality? Are there good alternatives? This is an incubating project, so there are no api stability promises. If a patch breaks some functionality that we've decided to not support going forward I don't see a problem with it. That said, if a patch breaks some functionality that we _do_ plan to keep, I'd prefer to see it done as a series of dependent commits that end with the feature in a working state again, even if some of the intermediate commits are not fully functional. Hopefully that will both keep the commit sizes down and provide a definite path back to functionality. Is there any sort of policy or convention of sending out a warning before that sort of thing is merged in so that people don't accidentally blindly pull master and break something they were using? b) In the past, we allowed parallel development of the UI and API by having well-documented expectations of what the API Are these expectations documented yet? I'm new to the project and still finding my way around. I've seen the wireframes and am going through Chen's icehouse requirements, but I haven't stumbled on too much talk about the APIs specifically (not suggesting they don't exist, more likely that I haven't found them yet). would provide. We would then mock those calls in the UI, replacing them with real API calls as they became available. Is this acceptable? This sounds reasonable to me. -Ben ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][TripleO] Nested resources
Along the same lines and while we're talking crazy ideas, one use case where a user might want to allocate entire nodes would be if TripleO were used to manage an ARM rack. The use cases aren't identical between cloud and ARM, but they are similar. So for a rack of 1000 nodes, there is benefit in certain cases for a user not only taking an entire node, but a collection of nodes co-located in the same rack to take advantage of the rack fabric. Again, crazy ideas and probably outside of the scope of things we want to bite off immediately. But as we're in the early stages of the Tuskar data and security models, it might make sense to at least keep in mind how we could play in this area as well. On 12/05/2013 08:11 PM, Fox, Kevin M wrote: I think the security issue can be handled by not actually giving the underlying resource to the user in the first place. So, for example, if I wanted a bare metal node's worth of resource for my own containering, I'd ask for a bare metal node and use a "blessed" image that contains docker+nova bits that would hook back to the cloud. I wouldn't be able to login to it, but containers started on it would be able to access my tenant's networks. All access to it would have to be through nova suballocations. The bare resource would count against my quotas, but nothing run under it. Come to think of it, this sounds somewhat similar to what is planned for Neutron service vm's. They count against the user's quota on one level but not all access is directly given to the user. Maybe some of the same implementation bits could be used. Thanks, Kevin From: Mark McLoughlin [mar...@redhat.com] Sent: Thursday, December 05, 2013 1:53 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Nova][TripleO] Nested resources Hi Kevin, On Mon, 2013-12-02 at 12:39 -0800, Fox, Kevin M wrote: Hi all, I just want to run a crazy idea up the flag pole. TripleO has the concept of an under and over cloud. In starting to experiment with Docker, I see a pattern start to emerge. * As a User, I may want to allocate a BareMetal node so that it is entirely mine. I may want to run multiple VM's on it to reduce my own cost. Now I have to manage the BareMetal nodes myself or nest OpenStack into them. * As a User, I may want to allocate a VM. I then want to run multiple Docker containers on it to use it more efficiently. Now I have to manage the VM's myself or nest OpenStack into them. * As a User, I may want to allocate a BareMetal node so that it is entirely mine. I then want to run multiple Docker containers on it to use it more efficiently. Now I have to manage the BareMetal nodes myself or nest OpenStack into them. I think this can then be generalized to: As a User, I would like to ask for resources of one type (One AZ?), and be able to delegate resources back to Nova so that I can use Nova to subdivide and give me access to my resources as a different type. (As a different AZ?) I think this could potentially cover some of the TripleO stuff without needing an over/under cloud. For that use case, all the BareMetal nodes could be added to Nova as such, allocated by the "services" tenant as running a nested VM image type resource stack, and then made available to all tenants. Sys admins then could dynamically shift resources from VM providing nodes to BareMetal Nodes and back as needed. This allows a user to allocate some raw resources as a group, then schedule higher level services to run only in that group, all with the existing api. Just how crazy an idea is this? FWIW, I don't think it's a crazy idea at all - indeed I mumbled something similar a few times in conversation with random people over the past few months :) With the increasing interest in containers, it makes a tonne of sense - you provision a number of VMs and now you want to carve them up by allocating containers on them. Right now, you'd need to run your own instance of Nova for that ... which is far too heavyweight. It is a little crazy in the sense that it's a tonne of work, though. There's not a whole lot of point in discussing it too much until someone shows signs of wanting to implement it :) One problem is how the API would model this nesting, another problem is making the scheduler aware that some nodes are only available to the tenant which owns them but maybe a bigger problem is the security model around allowing a node managed by an untrusted become a compute node. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@li
Re: [openstack-dev] [TripleO] capturing build details in images
On 12/05/2013 08:38 AM, James Slagle wrote: On Wed, Dec 4, 2013 at 5:19 PM, Robert Collins wrote: This is a follow up to https://review.openstack.org/59621 to get broader discussion.. So at the moment we capture a bunch of details in the image - what parameters the image was built with and some environment variables. Last week we were capturing everything, which there is broad consensus was too much, but it seems to me that that is based on two things: - the security ramifications of unanticipated details being baked into the image - many variables being irrelevant most of the time I think those are both good points. But... the problem with diagnostic information is you don't know that you need it until you don't have it. I'm particularly worried that things like bad http proxies, and third party elements that need variables we don't know about will be undiagnosable. Forcing everything through a DIB_FOO variable thunk seems like just creating work for ourselves - I'd like to avoid that. Further, some variables we should capture (like http_proxy) have passwords embedded in them, so even whitelisting what variables to capture doesn't solve the general problem. So - what about us capturing this information outside the image: we can create a uuid for the build, and write a file in the image with that uuid, and outside the image we can write: - all variables (no security ramifications now as this file can be kept by whomever built the image) - command line args - version information for the toolchain etc. +1. I like this idea a lot. What about making the uuid file written outside of the image be in json format so it's easily machine parseable? Something like: dib-.json would contain: { "environment" : { "DIB_NO_TMPFS": "1", ... }, "dib" : { "command-line" : , "version": . } } Could keep adding additional things like list of elements used, build time, etc. +1 to having a machine parsable version. Is that going to be a standard schema for all images or will there be an open-ended section that contains key/value pairs that are contingent on the actual type of image being built? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev