New question #631136 on Graphite: https://answers.launchpad.net/graphite/+question/631136
I have the following setup: Physical loadbalancer round robin - > 2 * graphite-web frontends with same configuration, CLUSTER_SERVERS = ["10.57.72.33:80", "10.57.72.34:80"], pointing to two graphite backend servers. Both backend servers have the same data, in order to have redundancy. The problem I see is that we get a lot of I/O utilization when having this configuration. I also see the following line in the exceptions.log: Failed to join remote_fetch thread 10.57.72.33:80 within 6s Failed to join remote_fetch thread 10.57.72.34:80 within 6s If I remove one server from the CLUSTER_SERVERS everything seems to work very well. I am running graphite-web version 0.9.15 Here is the complete conf for my frontend: SECRET_KEY = '?=exBKb/9J~m4B3re@P2Waa,`"H_e"x~' TIME_ZONE = 'CET' MEMCACHE_HOSTS = ['10.57.72.31:11211'] DEFAULT_CACHE_DURATION = 600 # Cache images and data for 10 minutes STORAGE_DIR = '/var/opt/graphite/storage' LOG_DIR = '/opt/graphite/storage/log/webapp' CLUSTER_SERVERS = ["10.57.72.33:80", "10.57.72.34:80"] CARBONLINK_HOSTS = [] Here is the conf for the backends: SECRET_KEY = '?=exBKb/9J~m4B3re@P2Waa,`"H_e"x~' TIME_ZONE = 'CET' WHISPER_DIR = '/var/opt/graphite/storage/whisper' CARBONLINK_HOSTS = ["127.0.0.1:7102:w1", "127.0.0.1:7103:w2", "127.0.0.1:7104:w3", "127.0.0.1:7105:w4", "127.0.0.1:7106:w5", "127.0.0.1:7107:w6"] CARBONLINK_QUERY_BULK = True Does anyone have any idea what could be causing this? Seems to be a configuration issue to me. -- You received this question notification because your team graphite-dev is an answer contact for Graphite. _______________________________________________ Mailing list: https://launchpad.net/~graphite-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~graphite-dev More help : https://help.launchpad.net/ListHelp

