After thinking more, it does seem like we're doing something wrong if the query itself is returning 300k rows. :) I can take a better look at it in front of the computer later if no one beats me to it.
On Feb 25, 2013, at 9:28 PM, Chris Behrens <[email protected]> wrote: > Replying from my phone, so I can't look, but I wonder if we have an index > missing. > > On Feb 25, 2013, at 8:54 PM, Sam Morrison <[email protected]> wrote: > >> On Tue, Feb 26, 2013 at 3:15 PM, Sam Morrison <[email protected]> wrote: >>> >>> On 26/02/2013, at 2:15 PM, Chris Behrens <[email protected]> wrote: >>> >>>> >>>> On Feb 25, 2013, at 6:39 PM, Joe Gordon <[email protected]> wrote: >>>> >>>>> >>>>> It looks like the scheduler issues are related to the rabbitmq issues. >>>>> "host 'qh2-rcc77' ... is disabled or has not been heard from in a while" >>>>> >>>>> What does 'nova host-list' say? the clocks must all be synced up? >>>> >>>> Good things to check. It feels like something is spinning way too much >>>> within this filter, though. This can also cause the above message. The >>>> scheduler pulls all of the records before it starts filtering… and if >>>> there's a huge delay somewhere, it can start seeing a bunch of hosts as >>>> disabled. >>>> >>>> The filter doesn't look like a problem.. unless there's a large amount of >>>> aggregate metadata… and/or a large amount of key/values for the >>>> instance_type's extra specs. There *is* a DB call in the filter. If >>>> that's blocking for an extended period of time, the whole process is >>>> blocked… But I suspect by the '100% cpu' comment, that this is not the >>>> case… So the only thing I can think of is that it returns a tremendous >>>> amount of metadata. >>>> >>>> Adding some extra logging in the filter could be useful. >>>> >>>> - Chris >>> >>> Thanks Chris, I have 2 aggregates and 2 keys defined and each of the 80 >>> hosts has either one or the other. At the moment every flavour has either >>> one or the other too so I don't think it's too much data. >>> >>> I've tracked it down to this call: >>> >>> metadata = db.aggregate_metadata_get_by_host(context, host_state.host) >> >> More debugging has got it down to a query >> >> In db.api.aggregate_metadata_get_by_host: >> >> query = model_query(context, models.Aggregate).join( >> "_hosts").filter(models.AggregateHost.host == host).join( >> "_metadata") >> ...... >> rows = query.all() >> >> With query debug on this resolves to: >> >> SELECT aggregates.created_at AS aggregates_created_at, >> aggregates.updated_at AS aggregates_updated_at, aggregates.deleted_at >> AS aggregates_deleted_at, aggregates.deleted AS aggregates_deleted, >> aggregates.id AS aggregates_id, aggregates.name AS aggregates_name, >> aggregates.availability_zone AS aggregates_availability_zone, >> aggregate_hosts_1.created_at AS aggregate_hosts_1_created_at, >> aggregate_hosts_1.updated_at AS aggregate_hosts_1_updated_at, >> aggregate_hosts_1.deleted_at AS aggregate_hosts_1_deleted_at, >> aggregate_hosts_1.deleted AS aggregate_hosts_1_deleted, >> aggregate_hosts_1.id AS aggregate_hosts_1_id, aggregate_hosts_1.host >> AS aggregate_hosts_1_host, aggregate_hosts_1.aggregate_id AS >> aggregate_hosts_1_aggregate_id FROM aggregates INNER JOIN >> aggregate_hosts AS aggregate_hosts_2 ON aggregates.id = >> aggregate_hosts_2.aggregate_id AND aggregate_hosts_2.deleted = 0 AND >> aggregates.deleted = 0 INNER JOIN aggregate_hosts ON >> aggregate_hosts.aggregate_id = aggregates.id AND >> aggregate_hosts.deleted = 0 AND aggregates.deleted = 0 INNER JOIN >> aggregate_metadata AS aggregate_metadata_1 ON aggregates.id = >> aggregate_metadata_1.aggregate_id AND aggregate_metadata_1.deleted = 0 >> AND aggregates.deleted = 0 INNER JOIN aggregate_metadata ON >> aggregate_metadata.aggregate_id = aggregates.id AND >> aggregate_metadata.deleted = 0 AND aggregates.deleted = 0 LEFT OUTER >> JOIN aggregate_hosts AS aggregate_hosts_3 ON aggregates.id = >> aggregate_hosts_3.aggregate_id AND aggregate_hosts_3.deleted = 0 AND >> aggregates.deleted = 0 LEFT OUTER JOIN aggregate_hosts AS >> aggregate_hosts_1 ON aggregate_hosts_1.aggregate_id = aggregates.id >> AND aggregate_hosts_1.deleted = 0 AND aggregates.deleted = 0 WHERE >> aggregates.deleted = 0 AND aggregate_hosts.host = 'qh2-rcc34'; >> >> Which in our case returns 328509 rows in set (25.97 sec) >> >> Seems a bit off considering there are 80 rows in aggregate_hosts, 2 >> rows in aggregates and 2 rows in aggregate_metadata >> >> In the code rows is only equal to 1 so it seems to be doing something >> inside to code to do this? Don't know too much how sqlalchemy works. >> >> Seems like a bug to me? or maybe our database has something wrong in it? >> >> Cheers, >> Sam _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : [email protected] Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp

