New question #254080 on Graphite:
https://answers.launchpad.net/graphite/+question/254080

We have a 3-cluster Graphite installation, version 0.9.12, and on one of the 
servers I see a steady stream of entries in the exception log as follows:

Thu Sep 04 19:28:36 2014 :: ('127.0.0.1', 7002)
None
Thu Sep 04 19:28:36 2014 :: ('127.0.0.1', 7202)
None

Looking through the code it appears this means that the function "recv_exactly" 
in render/datalib.py is not getting the expected amount of data from the carbon 
cache.  Aside from the fact that the message could be a little clearer (hint 
hint...), I am at a loss to explain what is happening.

Connections are being made to the cache query port, I can telnet to it, and 
Graphite appears to be working correctly.  We DO have a lot of traffic on the 
web servers (they are behind a load balancer, so mostly equal loads).  We do 
have some occasional issues with missing the last few data points on some graph 
lines.

The other servers have identical configurations, both carbon and webapp, and we 
don't see these messages on the other two.  Our architecture is as follows:

Each server has:
2 relays (Level 1) configured to send via consistent hashing to 2 other relays 
on all 3 servers.
2 relays (Level 2) configured to send via consistent hashing to the carbon 
caches on the local host
4 caches.

This way the consistent hash ring that the web app uses (for accessing caches 
on the local host) will match the ring that carbon-relay uses.  

So my question is twofold:

* What causes these errors?
* How do I fix it?

Thanks,
Steve Keller


-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.

_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~graphite-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to