Hi Joe,

On 26/02/2013, at 1:39 PM, Joe Gordon <j...@cloudscaling.com> wrote:

> 
> 
> On Mon, Feb 25, 2013 at 6:14 PM, Sam Morrison <sorri...@gmail.com> wrote:
> Hi Joe,
> 
> On 26/02/2013, at 11:19 AM, Joe Gordon <j...@cloudscaling.com> wrote:
> 
>> On Sun, Feb 24, 2013 at 3:31 PM, Sam Morrison <sorri...@gmail.com> wrote:
>> I have been playing with the AggregateInstanceExtraSpecs filter and can't 
>> get it to work.
>> 
>> In our staging environment it works fine with 4 compute nodes, I have 2 
>> aggregates to split them into 2.
>> 
>> When I try to do the same in our production environment which has 80 compute 
>> nodes (splitting them again into 2 aggregates) it doesn't work.
>> 
>> nova-scheduler starts to go very slow,  I scheduled an instance and gave up 
>> after 5 minutes, it seemed to be taking ages and the host was at 100% cpu. 
>> Also got about 500 messages in rabbit that were unacknowledged.
>> 
>> 
>> what does the nova-scheduler log say?  Where is the unacknowledged rabbitmq 
>> messages sent from?
> 
> Logs are below. Note the large time gap between selecting a host, this is 
> pretty much instantaneous without this filter.
> 
> Can't figure out how to see an unacknowledged message in rabbit but my guess 
> is it is the compute service updates from all the compute nodes. These aren't 
> happening and I think this is the reason that the attempts to schedule 
> further down are rejected with "is disabled or has not been heard from in a 
> while"
> 
> Do you see anything that could be an issue? Flags we use for scheduler are 
> below also:
> 
> Thanks for your help,
> Sam
> 
> 
> It looks like the scheduler issues are related to the rabbitmq issues.   
> "host 'qh2-rcc77' ... is disabled or has not been heard from in a while"
> 
> What does 'nova host-list' say?   the clocks must all be synced up?
>  

Yeah all the clocks are synced up fine. Doing a nova-manage service list gives 
me all :-) and updated at is correct.

We only have one nova-scheduler. It gets locked up and goes at 100% CPU. 
nova-scheduler seems to take the compute service updates off the queue while 
this is happening but doesn't ack them and going by the logs doesn't process 
them. This is why I suspect the hosts are eventually being rejected with a "not 
been heard from in a while" message. 
This is a symptom though I believe as the real issue is nova-scheduler locking 
up, it seems to take 30-60 seconds for it to process each host to determine if 
it passes the filters.

Does that make sense? Any other ideas on how to debug? 

Cheers,
Sam








_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to