New question #225623 on Graphite: https://answers.launchpad.net/graphite/+question/225623
We've recently added a lot more metrics to our graphite instance and it seems to be having some issues with performance. Mostly, some of our diamond collectors occasionally error out sending to graphite. We've tweaked a few settings here and there and there's no obvious disk IO load or anything like that. I checked the console log but didn't find anything indicating it ran out of file descriptors. The next thing we thought was to add more carbon caches. Right now, it's setup in an HA-style relay setup with 2 graphite notes. We wanted to add another carbon-cache to each of these nodes in order to boost performance. I guess the first question is, is this even a valid idea? Should we explore other avenues? Everything is setup to relay, so I'm not sure if we need to switch to consistent hashing, or what other changes to the config might be necessary. I've got our configs below. Any advice on how to proceed would be appreciated! [cache] LOCAL_DATA_DIR = /opt/graphite/storage/whisper/ USER = www-data PID_DIR = /var/run/graphite MAX_CACHE_SIZE = inf MAX_UPDATES_PER_SECOND = 7500 MAX_CREATES_PER_MINUTE = 50 LINE_RECEIVER_INTERFACE = 0.0.0.0 LINE_RECEIVER_PORT = 2013 PICKLE_RECEIVER_INTERFACE = 0.0.0.0 PICKLE_RECEIVER_PORT = 2014 CACHE_QUERY_INTERFACE = 0.0.0.0 CACHE_QUERY_PORT = 7012 LOG_UPDATES = False [relay] # Relay setup RELAY_METHOD = rules DESTINATIONS = graphite1:2014, graphite2:2014 LINE_RECEIVER_INTERFACE = 0.0.0.0 LINE_RECEIVER_PORT = 2003 PICKLE_RECEIVER_INTERFACE = 0.0.0.0 PICKLE_RECEIVER_PORT = 2004 MAX_QUEUE_SIZE = 100000 relay-rules.conf: [default] default = true destinations = graphite1:2014, graphite2:2014 I came up with an experimental config, adding in a [cache:b] section, and added it into the relay hosts/destinations accordingly, but without consistent hashing it seemed like this wouldn't be doing anything useful, what I tried was something like below: [cache] LOCAL_DATA_DIR = /opt/graphite/storage/whisper/ USER = www-data PID_DIR = /var/run/graphite MAX_CACHE_SIZE = inf MAX_UPDATES_PER_SECOND = 7500 MAX_CREATES_PER_MINUTE = 50 LINE_RECEIVER_INTERFACE = 0.0.0.0 LINE_RECEIVER_PORT = 2013 PICKLE_RECEIVER_INTERFACE = 0.0.0.0 PICKLE_RECEIVER_PORT = 2014 CACHE_QUERY_INTERFACE = 0.0.0.0 CACHE_QUERY_PORT = 7012 LOG_UPDATES = False WHISPER_LOCK_WRITES = True [cache:b] LINE_RECEIVER_PORT = 2113 PICKLE_RECEIVER_PORT = 2114 CACHE_QUERY_PORT = 7112 [relay] # Relay setup RELAY_METHOD = rules DESTINATIONS = graphite1:2014, graphite2:2014, graphite1:2114, graphite2:2114 LINE_RECEIVER_INTERFACE = 0.0.0.0 LINE_RECEIVER_PORT = 2003 PICKLE_RECEIVER_INTERFACE = 0.0.0.0 PICKLE_RECEIVER_PORT = 2004 MAX_QUEUE_SIZE = 100000 relay-rules.conf: [default] default = true destinations = graphite1:2014, graphite2:2014, graphite1:2114, graphite2:2114 However, I'm really not sure how useful that is, it seems like it would just be further duplicating data but not actually giving any sort of performance improvement anywhere. Any ideas of how to proceed? -- You received this question notification because you are a member of graphite-dev, which is an answer contact for Graphite. _______________________________________________ Mailing list: https://launchpad.net/~graphite-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~graphite-dev More help : https://help.launchpad.net/ListHelp

