Replying from my phone, so I can't look, but I wonder if we have an index 
missing.

On Feb 25, 2013, at 8:54 PM, Sam Morrison <[email protected]> wrote:

> On Tue, Feb 26, 2013 at 3:15 PM, Sam Morrison <[email protected]> wrote:
>> 
>> On 26/02/2013, at 2:15 PM, Chris Behrens <[email protected]> wrote:
>> 
>>> 
>>> On Feb 25, 2013, at 6:39 PM, Joe Gordon <[email protected]> wrote:
>>> 
>>>> 
>>>> It looks like the scheduler issues are related to the rabbitmq issues.   
>>>> "host 'qh2-rcc77' ... is disabled or has not been heard from in a while"
>>>> 
>>>> What does 'nova host-list' say?   the clocks must all be synced up?
>>> 
>>> Good things to check.  It feels like something is spinning way too much 
>>> within this filter, though.  This can also cause the above message.  The 
>>> scheduler pulls all of the records before it starts filtering… and if 
>>> there's a huge delay somewhere, it can start seeing a bunch of hosts as 
>>> disabled.
>>> 
>>> The filter doesn't look like a problem.. unless there's a large amount of 
>>> aggregate metadata… and/or a large amount of key/values for the 
>>> instance_type's extra specs.   There *is* a DB call in the filter.  If 
>>> that's blocking for an extended period of time, the whole process is 
>>> blocked…  But I suspect by the '100% cpu' comment, that this is not the 
>>> case…  So the only thing I can think of is that it returns a tremendous 
>>> amount of metadata.
>>> 
>>> Adding some extra logging in the filter could be useful.
>>> 
>>> - Chris
>> 
>> Thanks Chris, I have 2 aggregates and 2 keys defined and each of the 80 
>> hosts has either one or the other. At the moment every flavour has either 
>> one or the other too so I don't think it's too much data.
>> 
>> I've tracked it down to this call:
>> 
>> metadata = db.aggregate_metadata_get_by_host(context, host_state.host)
> 
> More debugging has got it down to a query
> 
> In db.api.aggregate_metadata_get_by_host:
> 
>    query = model_query(context, models.Aggregate).join(
>            "_hosts").filter(models.AggregateHost.host == host).join(
>            "_metadata")
>   ......
>   rows = query.all()
> 
> With query debug on this resolves to:
> 
> SELECT aggregates.created_at AS aggregates_created_at,
> aggregates.updated_at AS aggregates_updated_at, aggregates.deleted_at
> AS aggregates_deleted_at, aggregates.deleted AS aggregates_deleted,
> aggregates.id AS aggregates_id, aggregates.name AS aggregates_name,
> aggregates.availability_zone AS aggregates_availability_zone,
> aggregate_hosts_1.created_at AS aggregate_hosts_1_created_at,
> aggregate_hosts_1.updated_at AS aggregate_hosts_1_updated_at,
> aggregate_hosts_1.deleted_at AS aggregate_hosts_1_deleted_at,
> aggregate_hosts_1.deleted AS aggregate_hosts_1_deleted,
> aggregate_hosts_1.id AS aggregate_hosts_1_id, aggregate_hosts_1.host
> AS aggregate_hosts_1_host, aggregate_hosts_1.aggregate_id AS
> aggregate_hosts_1_aggregate_id FROM aggregates INNER JOIN
> aggregate_hosts AS aggregate_hosts_2 ON aggregates.id =
> aggregate_hosts_2.aggregate_id AND aggregate_hosts_2.deleted = 0 AND
> aggregates.deleted = 0 INNER JOIN aggregate_hosts ON
> aggregate_hosts.aggregate_id = aggregates.id AND
> aggregate_hosts.deleted = 0 AND aggregates.deleted = 0 INNER JOIN
> aggregate_metadata AS aggregate_metadata_1 ON aggregates.id =
> aggregate_metadata_1.aggregate_id AND aggregate_metadata_1.deleted = 0
> AND aggregates.deleted = 0 INNER JOIN aggregate_metadata ON
> aggregate_metadata.aggregate_id = aggregates.id AND
> aggregate_metadata.deleted = 0 AND aggregates.deleted = 0 LEFT OUTER
> JOIN aggregate_hosts AS aggregate_hosts_3 ON aggregates.id =
> aggregate_hosts_3.aggregate_id AND aggregate_hosts_3.deleted = 0 AND
> aggregates.deleted = 0 LEFT OUTER JOIN aggregate_hosts AS
> aggregate_hosts_1 ON aggregate_hosts_1.aggregate_id = aggregates.id
> AND aggregate_hosts_1.deleted = 0 AND aggregates.deleted = 0 WHERE
> aggregates.deleted = 0 AND aggregate_hosts.host = 'qh2-rcc34';
> 
> Which in our case returns 328509 rows in set (25.97 sec)
> 
> Seems a bit off considering there are 80 rows in aggregate_hosts, 2
> rows in aggregates and 2 rows in aggregate_metadata
> 
> In the code rows is only equal to 1 so it seems to be doing something
> inside to code to do this? Don't know too much how sqlalchemy works.
> 
> Seems like a bug to me? or maybe our database has something wrong in it?
> 
> Cheers,
> Sam

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to