hi, one issue i've noticed with our 'backlog' scheduling is that we register all our new measures in a single folder/filestore object. this folder or object in most production cases can grow quite large (tens/hundreds of thousands). so we don't load it all into memory, the drivers will only grab the first x items and process them. unfortunately, we don't control the ordering of the returned items so it is dependent on the ordering the backend returns. for Ceph, it returns in what i guess is some alphanumeric order. the file driver i believe returns based on how the filesystem indexes files. i have no idea how swift ordering behaves. the result of this is that we may starve some new measures from being processed because they keep getting pushed back by more recent measures if less agents are deployed.
with that said, this isn't a huge issue because measures can be processed on demand using refresh parameter but it's not ideal. i was thinking, to better handle processing while minimising the effects of a driver's natural indexing, we can hash our new measures into buckets based on metric_id. Gnocchi would hash all incoming metrics into 100? buckets and metricd agents would divide up these buckets and loop through them. this would ensure we have smaller buckets to deal with and therefore less chance, metrics get pushed back and starved. that said it will add additional requirements of 100? folders/filestore objects rather than 1. it will also mean we may be making significantly more smaller fetches vs single (possibly) giant fetch. to extend this, we could also hash into project_id groups and thus allow some projects to have more workers and thus more performant queries? this might be too product tailored. :) thoughts? cheers, -- gord __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
