By the way, I think that the number of repaired key is pretty high.
2014-01-03 06:33:42.857 [info] <0.31440.2586>@riak_kv_exchange_fsm:key_exchange:206 Repaired 1491787 keys during active anti-entropy exchange of {468137243207554840987117797979434404733540892672,3} between {473846233978378680511350941857232385279071879168,'riak@192.168.20.112'} and {479555224749202520035584085735030365824602865664,'riak@192.168.20.107'} I have few but consistent lines like this (every two hours, during this process). Best regards. On 2 January 2014 10:05, Edgar Veiga <edgarmve...@gmail.com> wrote: > This is the only thing related to AAE that exists in my app.config. I > haven't changed any default values... > > %% Enable active anti-entropy subsystem + optional debug > messages: > %% {anti_entropy, {on|off, []}}, > %% {anti_entropy, {on|off, [debug]}}, > {anti_entropy, {on, []}}, > > %% Restrict how fast AAE can build hash trees. Building the > tree > %% for a given partition requires a full scan over that > partition's > %% data. Once built, trees stay built until they are expired. > %% Config is of the form: > %% {num-builds, per-timespan-in-milliseconds} > %% Default is 1 build per hour. > {anti_entropy_build_limit, {1, 3600000}}, > > %% Determine how often hash trees are expired after being > built. > %% Periodically expiring a hash tree ensures the on-disk hash > tree > %% data stays consistent with the actual k/v backend data. It > also > %% helps Riak identify silent disk failures and bit rot. > However, > %% expiration is not needed for normal AAE operation and > should be > %% infrequent for performance reasons. The time is specified in > %% milliseconds. The default is 1 week. > {anti_entropy_expire, 604800000}, > > %% Limit how many AAE exchanges/builds can happen concurrently. > {anti_entropy_concurrency, 2}, > > %% The tick determines how often the AAE manager looks for work > %% to do (building/expiring trees, triggering exchanges, etc). > %% The default is every 15 seconds. Lowering this value will > %% speedup the rate that all replicas are synced across the > cluster. > %% Increasing the value is not recommended. > {anti_entropy_tick, 15000}, > > %% The directory where AAE hash trees are stored. > {anti_entropy_data_dir, "/var/lib/riak/anti_entropy"}, > > %% The LevelDB options used by AAE to generate the > LevelDB-backed > %% on-disk hashtrees. > {anti_entropy_leveldb_opts, [{write_buffer_size, 4194304}, > {max_open_files, 20}]}, > > I'll update the bloom filters value and see what happens... > > It's thursday again, and the regeneration process has started again. Since > I've updated to 1.4.6, I have another thing different. The get/put values > for each cluster node now have a "random" behaviour. Take a look at the > next screenshot > > https://cloudup.com/cgbu9VNhSo1 > > Best regards > > > On 31 December 2013 21:16, Charlie Voiselle <cvoise...@basho.com> wrote: > >> Edgar: >> >> Could you attach the AAE section of your app.config? I’d like to look >> into this issue further for you. Something I think you might be running >> into is https://github.com/basho/riak_core/pull/483. >> >> The issue of concern is that the LevelDB bloom filter is not enabled >> properly for the instance into which the AAE data is stored. You can >> mitigate this particular issue by adding *{use_bloomfilter, true}* as >> shown below: >> >> %% The LevelDB options used by AAE to generate the LevelDB-backed >> >> %% on-disk hashtrees. >> {anti_entropy_leveldb_opts, [{write_buffer_size, 4194304}, >> {max_open_files, 20}]}, >> >> >> Becomes: >> >> >> >> %% The LevelDB options used by AAE to generate the LevelDB-backed >> %% on-disk hashtrees. >> >> >> {anti_entropy_leveldb_opts, [{write_buffer_size, 4194304}, >> {use_bloomfilter, true}, >> >> {max_open_files, 20}]}, >> >> >> This might not solve your specific problem, but it will certainly improve >> your AAE performance. >> >> Thanks, >> Charlie Voiselle >> >> On Dec 31, 2013, at 12:04 PM, Edgar Veiga <edgarmve...@gmail.com> wrote: >> >> Hey guys! >> >> Nothing on this one? >> >> Btw: Happy new year :) >> >> >> On 27 December 2013 22:35, Edgar Veiga <edgarmve...@gmail.com> wrote: >> >>> This is a du -hs * of the riak folder: >>> >>> 44G anti_entropy >>> 1.1M kv_vnode >>> 252G leveldb >>> 124K ring >>> >>> It's a 6 machine cluster, so ~1512G of levelDB. >>> >>> Thanks for the tip, I'll upgrade in a near future! >>> >>> Best regards >>> >>> >>> On 27 December 2013 21:41, Matthew Von-Maszewski <matth...@basho.com>wrote: >>> >>>> I have a query out to the developer that can better respond to your >>>> follow-up questions. It might be Monday before we get a reply due to the >>>> holidays. >>>> >>>> Do you happen to know how much data is in the leveldb dataset and/or >>>> one vnode? Not sure it will change the response, but might be nice to have >>>> that info available. >>>> >>>> Matthew >>>> >>>> P.S. Unrelated to your question: Riak 1.4.4 is available for >>>> download. It has a couple of nice bug fixes for leveldb. >>>> >>>> >>>> On Dec 27, 2013, at 2:08 PM, Edgar Veiga <edgarmve...@gmail.com> wrote: >>>> >>>> Ok, thanks for confirming! >>>> >>>> Is it normal, that this action affects the overall state of the >>>> cluster? On the 26th It started the regeneration and the the response times >>>> of the cluster raised to never seen values. It was a day of heavy traffic >>>> but everything was going quite ok until it started the regeneration >>>> process.. >>>> >>>> Have you got any advices about changing those app.config values? My >>>> cluster is running smoothly for the past 6 months and I don't want to start >>>> all over again :) >>>> >>>> Best Regards >>>> >>>> >>>> On 27 December 2013 18:56, Matthew Von-Maszewski <matth...@basho.com>wrote: >>>> >>>>> Yes. Confirmed. >>>>> >>>>> There are options available in app.config to control how often this >>>>> occurs and how many vnodes rehash at once: defaults are every 7 days and >>>>> two vnodes per server at a time. >>>>> >>>>> Matthew Von-Maszewski >>>>> >>>>> >>>>> On Dec 27, 2013, at 13:50, Edgar Veiga <edgarmve...@gmail.com> wrote: >>>>> >>>>> Hi! >>>>> >>>>> I've been trying to find what may be the cause of this. >>>>> >>>>> Every once in a week, all the nodes in my riak cluster start to do >>>>> some kind of operation that lasts at least for two days. >>>>> >>>>> You can watch a sample of my munin logs regarding the last week in >>>>> here: >>>>> >>>>> https://cloudup.com/imWiBwaC6fm >>>>> Take a look at the days 19 and 20, and now it has started again on the >>>>> 26... >>>>> >>>>> I'm suspecting that this may be caused by the aae hash trees being >>>>> regenerated, as you say in your documentation: >>>>> For added protection, Riak periodically (default: once a week) clears >>>>> and regenerates all hash trees from the on-disk K/V data. >>>>> Can you confirm me that this may be the root of the "problem" and if >>>>> it's normal for the action to last for two days? >>>>> >>>>> I'm using riak 1.4.2 on 6 machines, with centOS. The backend is >>>>> levelDB. >>>>> >>>>> Best Regards, >>>>> Edgar Veiga >>>>> >>>>> _______________________________________________ >>>>> riak-users mailing list >>>>> riak-users@lists.basho.com >>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>> >>>>> >>>> >>>> >>> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com