On 08/04/2018 07:35 PM, Michael Glasgow wrote:
On 8/2/2018 7:27 PM, Jay Pipes wrote:
It's not an exception. It's normal course of events. NoValidHosts
means there were no compute nodes that met the requested resource
amounts.
To clarify, I didn't mean a python exception.
Neither did I. I was referring to exceptional behaviour, not a Python
exception.
I concede that I should've chosen a better word for the type of
object I have in mind.
If a SELECT statement against an Oracle DB returns 0 rows, is that an
exception? No. Would an operator need to re-send the SELECT statement
with an EXPLAIN SELECT in order to get information about what indexes
were used to winnow the result set (to zero)? Yes. Either that, or the
operator would need to gradually re-execute smaller SELECT statements
containing fewer filters in order to determine which join or predicate
caused a result set to contain zero rows.
I'm not sure if this analogy fully appreciates the perspective of the
operator. You're correct of course that if you select on a db and the
correct answer is zero rows, then zero rows is the right answer, 100% of
the time.
Whereas what I thought we meant when we talk about "debugging no valid
host failures" is that zero rows is *not* the right answer, and yet
you're getting zero rows anyway.
No, "debugging no valid host failures" doesn't mean that zero rows is
the wrong answer. It means "find out why Nova thinks there's nowhere
that my instance will fit".
So yes, absolutely with an Oracle DB you would get an ORA-XXXXX
exception in that case, along with a trace file that told you where
things went off the rails. Which is exactly what we don't have
here.
That is precisely the opposite of what I was saying. Again, getting no
results is *not* an error. It's normal behaviour and indicates there
were no compute hosts that met the requirements of the request. This is
not an error or exceptional behaviour. It's simply the result of a query
against the placement database.
If you get zero rows returned, that means you need to determine what
part of your request caused the winnowed result set to go from >0 rows
to 0 rows.
And what we've been discussing is exactly the process by which such an
investigation could be done. There are two options: do the investigation
*inline* as part of the original request or do it *offline* after the
original request returns 0 rows.
Doing it inline means splitting the large query we currently construct
into multiple queries (for each related group of requested resources
and/or traits) and logging the number of results grabbed for each of
those queries.
Doing if offline means developing some diagnostic tool that an operator
could run (similar to what Windriver did with [1]). The issue with that
is that the diagnostic tool can only represent the resource usage at the
time the diagnostic tool was run, not when the original request that
returned 0 rows ran.
[1]
https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-94f87e728df6465becce5241f3da53c8R330
If I understand your perspective correctly, it's basically that
placement is working as designed, so there's nothing more to do except
pore over debug output. Can we consider:
(1) that might not always be true if there are bugs
Bugs in the placement service are an entirely separate issue. They do
occur, of course, but we're not talking about that here.
(2) even when it is technically true, from the user's perspective, I'd
posit that it's rare that a user requests an instance with the express
intent of not launching an instance. (?) If they're "debugging" this
issue, it means there's a misconfiguration or some unexpected state that
they have to go find.
Depends on what you have in mind as a "user". If I launch an instance in
an AWS region, I'd be very surprised if the service told me there was
nowhere to place my instance unless of course I'd asked it to launch an
instance with requirements that exceeded AWS' ability to launch.
If you're talking about a user of a private IT cloud with a single rack
of compute hosts, that user might very well expect to see a return of
"sorry mate, there's nowhere to put your request right now.".
There is no explicit or implicit SLA or guarantee that Nova needs to
somehow create a place to put an instance when no such place exists to
put the instance.
Best,
-jay
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev