Hi All, A few questions on the procedure here to recover a failed node: http://docs.basho.com/riak/kv/2.2.3/using/repair-recovery/failed-node/
We lost a production riak server when AWS decided to delete a node and we plan on doing this procedure to replace it with a newly built node. A practice run in our QA environment has brought up some questions. - How can I tell when everything has synched up? I thought I could just monitor the handoffs but these completed within 5 minutes of comitting the cluster changes, the data directories continued to grow rapidly in size for at least an hour. I assume that this was data being synched to the new node but how can I tell when it has completed from the user level? Or is it left up to AAE to sync the data? - The size of the bitcask directory on the 4 original nodes is ~10GB, on the new node the size of this directory climbed to 1GB within an hour but hasn't moved much in the 4 days since. I know bitcask entries still exist until the periodic compaction but can it be right that its hanging on to 90% the disk space its using for dead data? - Not directly related to the recovery procedure, but while one node of a five-node cluster is down how is the extra load distributed within the cluster? It will still keep 3 copies of each entry, right? Are the copies that would have been on the missing node all stored on the next node in the ring, or distributed all around the cluster? Thanks in advance, //Sean.
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com