New question #252730 on Graphite:
https://answers.launchpad.net/graphite/+question/252730

Has any thought been put into the ability to Retry or Rehash Destinations when 
a backend carbon daemon goes down?

My concern is that in a cluster setup, there is potential for data loss when 
storage boxes (wherever your carbon daemons run) go down for any reason.

For example:

If I had a relay on one server receiving 100k metrics that were then being 
consistently hashed to 4 relays on other servers, it seems like there is a 
potential for loss. 

100k metrics>> 4 boxes @ ~25k each,
say MAX_QUEUE_SIZE is 20k

box 4 goes down so the primary relay starts to cache up to MAX_QUEUE_SIZE.

box 3 goes down so the primary relay starts to cache up to MAX_QUEUE_SIZE for 
this box too but is already full.

Then based on usage of flow control, metrics are potentially dropped on the 
floor as sockets are ignored.

 MAX_QUEUE_SIZE seems to be only useful when sending relatively small 
quantities of metrics in that it could fill very quickly if you are doing more. 

In the above case, I would hope for the primary relay to recognize the status 
of the daemons on boxes 3 and 4 and rehash their metrics to 1 and 2 so that 
there is no data loss.

Has anyone worked out a better solution for a larger scale cluster setup? Are 
there plans to add a retry or a command for the relays to rehash based on a 
modified destination list? I would imagine this would require some sort of 
"unaccessible destinations list" that a destination could be sent to in order 
to filter out unresponsive carbon daemons. Also, I would think that you would 
need some form of check in place against the destinations in order to determine 
whether they are viable candidates for writes or not and then would modify the 
aforementioned unaccessible list.

Just wanted to put the question out there before I try to write something.


-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.

_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~graphite-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to