Hi All While the RetryScheduler may not have been designed specifically to fix this issue https://bugs.launchpad.net/nova/+bug/1011852 suggests that it is meant to fix it, well if "it" is a scheduler race condition which is my suspicion.
This is my current scheduler config which gives the failure mode I describe: scheduler_available_filters=nova.scheduler.filters.standard_filters scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilte r,RetryFilter scheduler_max_attempts=30 least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn compute_fill_first_cost_fn_weight=1.0 cpu_allocation_ratio=1.0 ram_allocation_ratio=1.0 I'm running the scheduler and api server on a single controller host and it's pretty consistent about scheduling >hundred instances per node at first then iteratively rescheduling them elsewhere when presented with either an single API request to start many instances (using euca2ools) or a shell loop around nova boot to generate one api request per server. the cpu_allocation ratio should limit the scheduler to 24 instances per compute node regardless how how it's calculating memory, so while I talked a lot about memory allocation as a motivation it is more frequent for cpu to actually be the limiting factor in my deployment and it certainly should. And yet after attempting to launch 200 m1.tiny instances: root@nimbus-0:~# nova-manage service describe_resource nova-23 2012-10-31 11:17:56 HOST PROJECT cpu mem(mb) hdd nova-23 (total) 24 48295 882 nova-23 (used_now) 107 56832 30 nova-23 (used_max) 107 56320 30 nova-23 98333a1a28e746fa8c629c83a818ad57 106 54272 0 nova-23 3008a142e9524f7295b06ea811908f93 1 2048 30 eventually those bleed off to other systems though not entirely 2012-10-31 11:29:41 HOST PROJECT cpu mem(mb) hdd nova-23 (total) 24 48295 882 nova-23 (used_now) 43 24064 30 nova-23 (used_max) 43 23552 30 nova-23 98333a1a28e746fa8c629c83a818ad57 42 21504 0 nova-23 3008a142e9524f7295b06ea811908f93 1 2048 30 at this point 12min later out of 200 instances 168 are active 22 are errored and 10 are still "building". Notably only 23 actual VMs are running on "nova-23": root@nova-23:~# virsh list|grep instance |wc -l 23 So that's what I see perhaps my assumptions about why I'm seeing it are incorrect Thanks, -Jon _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp