Question #187874 on Graphite changed:
https://answers.launchpad.net/graphite/+question/187874
Status: Open => Answered
Michael Leinartas proposed the following answer:
Aggregator is more likely CPU bound since for both rewrites and
aggregation it's running multiple regex matches on each metric coming
in. You'll want to run multiple carbon-aggregators as a start. If you're
only using carbon-aggregator for rewrites (i.e. your aggregation-
rules.conf is empty) then you can run multiple carbon-aggregators
pointed to the same carbon-cache and spread load to them using carbon-
relay or haproxy.
If you *are* using aggregation the problem is more complex since the same
metric name and timestamp will be sent from each aggregator and one will
overwrite the other. For this, you have a few options:
* Place carbon-relay in front of the aggregators in relay-rules mode and shard
the data by metric path
You'll need to ensure that none of your aggregation rules combine metrics
across the shard
* Send multiple aggregators to another aggregator with rules to aggregate the
aggregated values ('from' and 'to' regex will be identical) - note that doing
this requires trunk as 0.9.9 has a bug when 'from' and 'to' rules are identical
If you're using 0.9.9, I'd suggest applying this patch since you're
queuing on the client side and the queue draining behavior is somewhat
broken in 0.9.9: http://bazaar.launchpad.net/~graphite-
dev/graphite/main/revision/671
Also consider the fact that you're on EC2 - you might be seeing the
behavior triggered by a slowdown of your instance (other customers,
etc). The above patch and an increase in your MAX_QUEUE_SIZE may allow
you to ride out a slowdown
--
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.
_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to : [email protected]
Unsubscribe : https://launchpad.net/~graphite-dev
More help : https://help.launchpad.net/ListHelp