Hi Martin, Thanks for taking the time. Yes, by "size of the bitcask directory" I mean I did a "du -h --max-depth=1 bitcask", so I think that would cover all the vnodes. We don't use any other backends. Those answers are helpful, will get back to this in a few days and see what I can determine about where our data physically lies. Might have more questions then. Cheers, //Sean.
On Wed, Aug 8, 2018 at 6:05 PM, Martin Sumner <martin.sum...@adaptip.co.uk> wrote: > Based on a quick read of the code, compaction in bitcask is performed only > on "readable" files, and the current active file for writing is excluded > from that list. With default settings, that active file can grow to 2GB. > So it is possible that if objects had been replaced/deleted many times > within the active file, that space will not be recovered if all the > replacements amount to < 2GB per vnode. So at these small data sizes - you > may get a relatively significant discrepancy between an old and recovered > node in terms of disk space usage. > > On 8 August 2018 at 17:37, Martin Sumner <martin.sum...@adaptip.co.uk> > wrote: > >> Sean, >> >> Some partial answers to your questions. >> >> I don't believe force-replace itself will sync anything up - it just >> reassigns ownership (hence handoff happens very quickly). >> >> Read repair would synchronise a portion of the data. So if 10% of you >> data is read regularly, this might explain some of what you see. >> >> AAE should also repair your data. But if nothing has happened for 4 >> days, then that doesn't seem to be the case. It would be worth checking >> the aae-status page (http://docs.basho.com/riak/kv >> /2.2.3/using/admin/riak-admin/#aae-status) to confirm things are >> happening. >> >> I don't know if there are any minimum levels of data before bitcask will >> perform compaction. There's nothing obvious in the code that wouldn't be >> triggered way before 90%. I don't know if it will merge on the active file >> (the one currently being written to), but that is 2GB max size (configured >> through bitcask.max_file_size). >> >> When you say the size of the bitcask directory - is this the size shared >> across all vnodes on the node? I guess if each vnode has a single file >> <2GB, and there are multiple vnodes - something unexpected might happen >> here? If bitcask does indeed not merge the file active for writing. >> >> In terms of distribution around the cluster, if you have an n_val of 3 >> you should normally expect to see a relatively even distribution of the >> data on failure (certainly not it all going to one). Worst case scenario >> is that 3 nodes get all the load from that one failed node. >> >> When a vnode is inaccessible, 3 (assuming n=3) fallback vnodes are >> selected to handle the load for that 1 vnode (as that vnode would normally >> be in 3 preflists, and commonly a different node will be asked to start a >> vnode for each preflist). >> >> >> I will try and dig later into bitcask merge/compaction code, to see if I >> spot anything else. >> >> Martin >> >> >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com