After thinking more, it does seem like we're doing something wrong if the query 
itself is returning 300k rows. :)  I can take a better look at it in front of 
the computer later if no one beats me to it.

On Feb 25, 2013, at 9:28 PM, Chris Behrens <[email protected]> wrote:

> Replying from my phone, so I can't look, but I wonder if we have an index 
> missing.
> 
> On Feb 25, 2013, at 8:54 PM, Sam Morrison <[email protected]> wrote:
> 
>> On Tue, Feb 26, 2013 at 3:15 PM, Sam Morrison <[email protected]> wrote:
>>> 
>>> On 26/02/2013, at 2:15 PM, Chris Behrens <[email protected]> wrote:
>>> 
>>>> 
>>>> On Feb 25, 2013, at 6:39 PM, Joe Gordon <[email protected]> wrote:
>>>> 
>>>>> 
>>>>> It looks like the scheduler issues are related to the rabbitmq issues.   
>>>>> "host 'qh2-rcc77' ... is disabled or has not been heard from in a while"
>>>>> 
>>>>> What does 'nova host-list' say?   the clocks must all be synced up?
>>>> 
>>>> Good things to check.  It feels like something is spinning way too much 
>>>> within this filter, though.  This can also cause the above message.  The 
>>>> scheduler pulls all of the records before it starts filtering… and if 
>>>> there's a huge delay somewhere, it can start seeing a bunch of hosts as 
>>>> disabled.
>>>> 
>>>> The filter doesn't look like a problem.. unless there's a large amount of 
>>>> aggregate metadata… and/or a large amount of key/values for the 
>>>> instance_type's extra specs.   There *is* a DB call in the filter.  If 
>>>> that's blocking for an extended period of time, the whole process is 
>>>> blocked…  But I suspect by the '100% cpu' comment, that this is not the 
>>>> case…  So the only thing I can think of is that it returns a tremendous 
>>>> amount of metadata.
>>>> 
>>>> Adding some extra logging in the filter could be useful.
>>>> 
>>>> - Chris
>>> 
>>> Thanks Chris, I have 2 aggregates and 2 keys defined and each of the 80 
>>> hosts has either one or the other. At the moment every flavour has either 
>>> one or the other too so I don't think it's too much data.
>>> 
>>> I've tracked it down to this call:
>>> 
>>> metadata = db.aggregate_metadata_get_by_host(context, host_state.host)
>> 
>> More debugging has got it down to a query
>> 
>> In db.api.aggregate_metadata_get_by_host:
>> 
>>   query = model_query(context, models.Aggregate).join(
>>           "_hosts").filter(models.AggregateHost.host == host).join(
>>           "_metadata")
>>  ......
>>  rows = query.all()
>> 
>> With query debug on this resolves to:
>> 
>> SELECT aggregates.created_at AS aggregates_created_at,
>> aggregates.updated_at AS aggregates_updated_at, aggregates.deleted_at
>> AS aggregates_deleted_at, aggregates.deleted AS aggregates_deleted,
>> aggregates.id AS aggregates_id, aggregates.name AS aggregates_name,
>> aggregates.availability_zone AS aggregates_availability_zone,
>> aggregate_hosts_1.created_at AS aggregate_hosts_1_created_at,
>> aggregate_hosts_1.updated_at AS aggregate_hosts_1_updated_at,
>> aggregate_hosts_1.deleted_at AS aggregate_hosts_1_deleted_at,
>> aggregate_hosts_1.deleted AS aggregate_hosts_1_deleted,
>> aggregate_hosts_1.id AS aggregate_hosts_1_id, aggregate_hosts_1.host
>> AS aggregate_hosts_1_host, aggregate_hosts_1.aggregate_id AS
>> aggregate_hosts_1_aggregate_id FROM aggregates INNER JOIN
>> aggregate_hosts AS aggregate_hosts_2 ON aggregates.id =
>> aggregate_hosts_2.aggregate_id AND aggregate_hosts_2.deleted = 0 AND
>> aggregates.deleted = 0 INNER JOIN aggregate_hosts ON
>> aggregate_hosts.aggregate_id = aggregates.id AND
>> aggregate_hosts.deleted = 0 AND aggregates.deleted = 0 INNER JOIN
>> aggregate_metadata AS aggregate_metadata_1 ON aggregates.id =
>> aggregate_metadata_1.aggregate_id AND aggregate_metadata_1.deleted = 0
>> AND aggregates.deleted = 0 INNER JOIN aggregate_metadata ON
>> aggregate_metadata.aggregate_id = aggregates.id AND
>> aggregate_metadata.deleted = 0 AND aggregates.deleted = 0 LEFT OUTER
>> JOIN aggregate_hosts AS aggregate_hosts_3 ON aggregates.id =
>> aggregate_hosts_3.aggregate_id AND aggregate_hosts_3.deleted = 0 AND
>> aggregates.deleted = 0 LEFT OUTER JOIN aggregate_hosts AS
>> aggregate_hosts_1 ON aggregate_hosts_1.aggregate_id = aggregates.id
>> AND aggregate_hosts_1.deleted = 0 AND aggregates.deleted = 0 WHERE
>> aggregates.deleted = 0 AND aggregate_hosts.host = 'qh2-rcc34';
>> 
>> Which in our case returns 328509 rows in set (25.97 sec)
>> 
>> Seems a bit off considering there are 80 rows in aggregate_hosts, 2
>> rows in aggregates and 2 rows in aggregate_metadata
>> 
>> In the code rows is only equal to 1 so it seems to be doing something
>> inside to code to do this? Don't know too much how sqlalchemy works.
>> 
>> Seems like a bug to me? or maybe our database has something wrong in it?
>> 
>> Cheers,
>> Sam

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to