New question #254080 on Graphite:
https://answers.launchpad.net/graphite/+question/254080
We have a 3-cluster Graphite installation, version 0.9.12, and on one of the
servers I see a steady stream of entries in the exception log as follows:
Thu Sep 04 19:28:36 2014 :: ('127.0.0.1', 7002)
None
Thu Sep 04 19:28:36 2014 :: ('127.0.0.1', 7202)
None
Looking through the code it appears this means that the function "recv_exactly"
in render/datalib.py is not getting the expected amount of data from the carbon
cache. Aside from the fact that the message could be a little clearer (hint
hint...), I am at a loss to explain what is happening.
Connections are being made to the cache query port, I can telnet to it, and
Graphite appears to be working correctly. We DO have a lot of traffic on the
web servers (they are behind a load balancer, so mostly equal loads). We do
have some occasional issues with missing the last few data points on some graph
lines.
The other servers have identical configurations, both carbon and webapp, and we
don't see these messages on the other two. Our architecture is as follows:
Each server has:
2 relays (Level 1) configured to send via consistent hashing to 2 other relays
on all 3 servers.
2 relays (Level 2) configured to send via consistent hashing to the carbon
caches on the local host
4 caches.
This way the consistent hash ring that the web app uses (for accessing caches
on the local host) will match the ring that carbon-relay uses.
So my question is twofold:
* What causes these errors?
* How do I fix it?
Thanks,
Steve Keller
--
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.
_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to : [email protected]
Unsubscribe : https://launchpad.net/~graphite-dev
More help : https://help.launchpad.net/ListHelp