Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

Will Foster Thu, 12 Dec 2013 09:30:56 -0800

On 12/12/13 09:42 +1300, Robert Collins wrote:

On 12 December 2013 01:17, Jaromir Coufal <[email protected]> wrote:

On 2013/10/12 23:09, Robert Collins wrote:

The 'easiest' way is to support bigger companies with huge deployments,
tailored infrastructure, everything connected properly.

But there are tons of companies/users who are running on old
heterogeneous
hardware. Very likely even more than the number of companies having
already
mentioned large deployments. And giving them only the way of 'setting up
rules' in order to get the service on the node - this type of user is not
gonna use our deployment system.



Thats speculation. We don't know if they will or will not because we
haven't given them a working system to test.


Some part of that is speculation, some part of that is feedback from people
who are doing deployments (of course its just very limited audience).
Anyway, it is not just pure theory.


Sure. Let be me more precise. There is a hypothesis that lack of
direct control will be a significant adoption blocker for a primary
group of users.

I think it's safe to say that some users in the group 'sysadmins
having to deploy an OpenStack cloud' will find it a bridge too far and
not use a system without direct control. Call this group A.

I think it's also safe to say that some users will not care in the
slightest, because their deployment is too small for them to be
particularly worried (e.g. about occasional downtime (but they would
worry a lot about data loss)). Call this group B.

I suspect we don't need to consider group C - folk who won't use a
system if it *has* manual control, but thats only a suspicion. It may
be that the side effect of adding direct control is to reduce
usability below the threshold some folk need...

To assess 'significant adoption blocker' we basically need to find the
% of users who will care sufficiently that they don't use TripleO.

How can we do that? We can do questionnaires, and get such folk to
come talk with use, but that suffers from selection bias - group B can
use the system with or without direct manual control, so have little
motivation to argue vigorously in any particular direction. Group A
however have to argue because they won't use the system at all without
that feature, and they may want to use the system for other reasons,
so that because a crucial aspect for them.

A much better way IMO is to test it - to get a bunch of volunteers and
see who responds positively to a demo *without* direct manual control.

To do that we need a demoable thing, which might just be mockups that
show a set of workflows (and include things like Jay's
shiny-new-hardware use case in the demo).

I rather suspect we're building that anyway as part of doing UX work,
so maybe what we do is put a tweet or blog post up asking for
sysadmins who a) have not yet deployed openstack, b) want to, and c)
are willing to spend 20-30 minutes with us, walk them through a demo
showing no manual control, and record what questions they ask, and
whether they would like to have that product to us, and if not, then
(a) what use cases they can't address with the mockups and (b) what
other reasons they have for not using it.

This is a bunch of work though!

So, do we need to do that work?

*If* we can layer manual control on later, then we could defer this
testing until we are at the point where we can say 'the nova scheduled
version is ready, now lets decide if we add the manual control'.

OTOH, if we *cannot* layer manual control on later - if it has
tentacles through too much of the code base, then we need to decide
earlier, because it will be significantly harder to add later and that
may be too late of a ship date for vendors shipping on top of TripleO.

So with that as a prelude, my technical sense is that we can layer
manual scheduling on later: we provide an advanced screen, show the
list of N instances we're going to ask for and allow each instance to
be directly customised with a node id selected from either the current
node it's running on or an available node. It's significant work both
UI and plumbing, but it's not going to be made harder by the other
work we're doing AFAICT.

-> My proposal is that we shelve this discussion until we have the
nova/heat scheduled version in 'and now we polish' mode, and then pick
it back up and assess user needs.

An alternative argument is to say that group A is a majority of the
userbase and that doing an automatic version is entirely unnecessary.
Thats also possible, but I'm extremely skeptical, given the huge cost
of staff time, and the complete lack of interest my sysadmin friends
(and my former sysadmin self) have in doing automatable things by
hand.


I just wanted to add a few thoughts:

For some comparative information here "from the field" I work
extensively on deployments of large OpenStack implementations,

most recently with a ~220node/9rack deployment (scaling up to 42racks / 1024 nodes soon). My primary role is of a Devops/Sysadmin nature, and not a specific development area so rapid provisioning/tooling/automation is an area I almost exclusively work within (mostly using API-driven using Foreman/Puppet). The infrastructure our small team designs/builds supports our development and business.


I am the target user base you'd probably want to cater to.

I can tell you the philosophy and mechanics of Tuskar/OOO are great, something I'd love to start using extensively but there are some needed aspects in the areas of control that I feel should be added (though arguably

less for me and more for my ilk who are looking to expand their OpenStack 
footprint).

* ability to 'preview' changes going to the scheduler
* ability to override/change some aspects within node assignment
* ability to view at least minimal logging from within Tuskar UI

Here's the main reason - most new adopters of OpenStack/IaaS are going to be
running legacy/mixed hardware and while they might have an initiative to
explore and invest and even a decent budget most of them are not going to have
completely identical hardware, isolated/flat networks and things set
aside in such a way that blind auto-discovery/deployment will just work all
the time.

There will be a need to sometimes adjust, and those coming from a more
vertically-scaling infrastructure (most large orgs.) will not have

100% matching standards in place of vendor, machine spec and network design which may make Tuscar/OOO seem inflexible and 'one-way'. This may just be a

carry-over or fear of the old ways of deployment but nonetheless it
is present.

In my case, we're lucky enough to have dedicated, near-identical
equipment and a flexible network design we've architected prior that
makes Tuskar/OOO a great fit.  Most people will not have this
greenfield ability and will use what they have lying around initially
as to not make a big investment until familiarity and trust of
something new is permeated.

That said, I've been working with Jaromir Coufal on some UI mockups of
Tuskar with some of this 'advanced' functionality included and from
my perspective it looks like something to consider pulling in sooner than
later if you want to maximize the adoption of new users.

Thanks,

-will

Lets break the concern into two halves:
A) Users who could have their needs met, but won't use TripleO because
meeting their needs in this way is too hard/complex/painful.

B) Users who have a need we cannot meet with the current approach.

For category B users, their needs might be specific HA things - like
the oft discussed failure domains angle, where we need to split up HA
clusters across power bars, aircon, switches etc. Clearly long term we
want to support them, and the undercloud Nova scheduler is entirely
capable of being informed about this, and we can evolve to a holistic
statement over time. Lets get a concrete list of the cases we can
think of today that won't be well supported initially, and we can
figure out where to do the work to support them properly.


My question is - can't we help them now? To enable users to use our app even
when we don't have enough smartness to help them 'auto' way?


I understand the question: but I can't answer it until we have *an*
example that is both real and not deliverable today. At the moment the
only one we know of is HA, and thats certainly an important feature on
the nova scheduled side, so doing manual control to deliver a future
automatic feature doesn't make a lot of sense to me. Crawl, walk, run.

This is great point. It's very manual and we can do all hugely better. But
we can't do anything about that until we have all new shiny features in (and
it will take time to figure out the best way how to do that properly). Can
we help them now? Can we scale our potential user base, get them in early,
get more feedback on their requirements, needs, expectations?


I'm desperate for us to scale our user base.

Right now we're blocked on the nova baremetal-preserve-ephemeral
rebuild blueprint, and then after that heat rolling deploys. *those*
are absolutely critical, regardless of what goes in Tuskar or Tuskar
UI - they are baseline 'the system doesn't work otherwise' aspects,
which will have a profound impact on the ability to sensibly use
TripleO.

I just want to add one more important point. The whole time we talk about
satisfying users needs, but the other aspect is their psychology (and
fulfilling their expectations). We can cover all they need, but they still
might want to 'feel' the power of control. Note, this is not just my
prejudice, I asked and discussed that with couple of people - I hope that
folks will jump in to confirm.


Certainly - I agree psychology is an important part of this, and it's
not one we can answer from first principles. It is however also one we
can't answer by exemplar: we need to know the population occurrence
rates for each archetype we encounter, and that means getting out and
recruiting an unbiased sample somehow.

-Rob

pgppyDOIL5LzZ.pgp
Description: PGP signature

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

Reply via email to