On 08/04/2018 07:35 PM, Michael Glasgow wrote:
On 8/2/2018 7:27 PM, Jay Pipes wrote:
It's not an exception. It's normal course of events. NoValidHosts means there were no compute nodes that met the requested resource amounts.

To clarify, I didn't mean a python exception.

Neither did I. I was referring to exceptional behaviour, not a Python exception.

I concede that I should've chosen a better word for the type of
object I have in mind.

If a SELECT statement against an Oracle DB returns 0 rows, is that an exception? No. Would an operator need to re-send the SELECT statement with an EXPLAIN SELECT in order to get information about what indexes were used to winnow the result set (to zero)? Yes. Either that, or the operator would need to gradually re-execute smaller SELECT statements containing fewer filters in order to determine which join or predicate caused a result set to contain zero rows.

I'm not sure if this analogy fully appreciates the perspective of the operator.  You're correct of course that if you select on a db and the correct answer is zero rows, then zero rows is the right answer, 100% of the time.

Whereas what I thought we meant when we talk about "debugging no valid host failures" is that zero rows is *not* the right answer, and yet you're getting zero rows anyway.

No, "debugging no valid host failures" doesn't mean that zero rows is the wrong answer. It means "find out why Nova thinks there's nowhere that my instance will fit".

So yes, absolutely with an Oracle DB you would get an ORA-XXXXX
exception in that case, along with a trace file that told you where
things went off the rails.  Which is exactly what we don't have
here.
That is precisely the opposite of what I was saying. Again, getting no results is *not* an error. It's normal behaviour and indicates there were no compute hosts that met the requirements of the request. This is not an error or exceptional behaviour. It's simply the result of a query against the placement database.

If you get zero rows returned, that means you need to determine what part of your request caused the winnowed result set to go from >0 rows to 0 rows.

And what we've been discussing is exactly the process by which such an investigation could be done. There are two options: do the investigation *inline* as part of the original request or do it *offline* after the original request returns 0 rows.

Doing it inline means splitting the large query we currently construct into multiple queries (for each related group of requested resources and/or traits) and logging the number of results grabbed for each of those queries.

Doing if offline means developing some diagnostic tool that an operator could run (similar to what Windriver did with [1]). The issue with that is that the diagnostic tool can only represent the resource usage at the time the diagnostic tool was run, not when the original request that returned 0 rows ran.

[1] https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-94f87e728df6465becce5241f3da53c8R330

If I understand your perspective correctly, it's basically that placement is working as designed, so there's nothing more to do except pore over debug output.  Can we consider:

  (1) that might not always be true if there are bugs

Bugs in the placement service are an entirely separate issue. They do occur, of course, but we're not talking about that here.

 (2) even when it is technically true, from the user's perspective, I'd posit that it's rare that a user requests an instance with the express intent of not launching an instance. (?)  If they're "debugging" this issue, it means there's a misconfiguration or some unexpected state that they have to go find.

Depends on what you have in mind as a "user". If I launch an instance in an AWS region, I'd be very surprised if the service told me there was nowhere to place my instance unless of course I'd asked it to launch an instance with requirements that exceeded AWS' ability to launch.

If you're talking about a user of a private IT cloud with a single rack of compute hosts, that user might very well expect to see a return of "sorry mate, there's nowhere to put your request right now.".

There is no explicit or implicit SLA or guarantee that Nova needs to somehow create a place to put an instance when no such place exists to put the instance.

Best,
-jay

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to