Excellent and informative explanation, thank you very much. We’re very happy that our adjustments have returned the cluster to its normal operating parameters. Also glad that Riak 2 will be handling this stuff programmatically, as prior to your spreadsheet and explanation it was pure voodoo for us. I think the automation will significantly decrease the number of animal sacrifices needed to appease the levelDB gods! :)
Sean McKibben On Jan 10, 2014, at 9:18 AM, Matthew Von-Maszewski <matth...@basho.com> wrote: > Attached is the spreadsheet I used for deriving the cache_size and > max_open_files. The general guidelines of the spreadsheet are: > > vnode count: ring size divided by (number of nodes minus one) > write_buf_min/max: don't touch … you will screw up my leveldb tuning > cache_size: 8Mbytes is hard minimum > max_open_files: this is NOT a file count in 1.4. It is 4Mbytes times the > value. File cache is meta-data size based, not file count. > > lower cache_size and raise max_open_files as necessary to keep "remaining" > close to zero AND cover your total file metadata size > > What is file metadata size? I looked at one vnode's LOG file for rough > estimates: > > - Your total file count was 1,479 in one vnode > - You typically hit the 75,000 key limit > - Key count (75,000) divided into a typical file size is 496 bytes … used 496 > as average value size > - Block_size is 4096. 496 value size goes into block size about 10 times (no > need for fractions since block_size is a threshold, not fixed value) > - 75,000 total keys in file, 10 keys per block … that means 7,500 keys in > file's index … 100 bytes per key is 750,000 bytes of keys in index. > - bloom filter is 2 bytes per key (all 75,000 keys) or 150,00 bytes > - metadata loaded into file cache is therefore 750,000 + 150,000 bytes per > file or 900,000 bytes. > - 900,000 bytes per file times 1,479 files is 1,331,100,000 bytes of file > cache needed … > > Your original 315 max_open_files is 1,279,262,720 in size (315 * 4Mbytes) … > file cache is thrashing since 1,279,262,720 is less than 1,331,100,000. > > I told you 425 as a max_open_files setting, spreadsheet has 400 as more > conservative number. > > Matthew > > > <leveldb_sizing_1.4.push.xls> > On Jan 10, 2014, at 9:41 AM, Martin May <mar...@push.io> wrote: > >> Hi Matthew, >> >> We applied this change to node 4, started it up, and it seems much happier >> (no crazy CPU). We’re going to keep an eye on it for a little while, and >> then apply this setting to all the other nodes as well. >> >> Is there anything we can do to prevent this scenario in the future, or >> should the settings you suggested take care of that? >> >> Thanks, >> Martin >> >> On Jan 10, 2014, at 6:42 AM, Matthew Von-Maszewski <matth...@basho.com> >> wrote: >> >>> Sean, >>> >>> I did some math based upon the app.config and LOG files. I am guessing >>> that you are starting to thrash your file cache. >>> >>> This theory should be easy to prove / disprove. On that one node, change >>> the cache_size and max_open_files to: >>> >>> cache_size 68435456 >>> max_open_files 425 >>> >>> If I am correct, the node should come up and not cause problems. We are >>> trading block cache space for file cache space. A miss in the file cache >>> is far more costly than a miss in the block cache. >>> >>> Let me know how this works for you. It is possible that we might want to >>> talk about raising your block size slightly to reduce file cache overhead. >>> >>> Matthew >>> >>> On Jan 9, 2014, at 9:33 PM, Sean McKibben <grap...@graphex.com> wrote: >>> >>>> We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon >>>> it started responding extremely slowly. CPU on member 4 was extremely high >>>> and we restarted that process, but it didn’t help. We temporarily shut >>>> down member 4 and cluster speed returned to normal, but as soon as we boot >>>> member 4 back up, the cluster performance goes to shit. >>>> >>>> We’ve run in to this before but were able to just start with a fresh set >>>> of data after wiping machines as it was before we migrated to this >>>> bare-metal cluster. Now it is causing some pretty significant issues and >>>> we’re not sure what we can do to get it back to normal, many of our queues >>>> are filling up and we’ll probably have to take node 4 off again just so we >>>> can provide a regular quality of service. >>>> >>>> We’ve turned off AAE on node 4 but it hasn’t helped. We have some >>>> transfers that need to happen but they are going very slowly. >>>> >>>> 'riak-admin top’ on node 4 reports this: >>>> Load: cpu 610 Memory: total 503852 binary >>>> 231544 >>>> procs 804 processes 179850 code >>>> 11588 >>>> runq 134 atom 533 ets >>>> 4581 >>>> >>>> Pid Name or Initial Func Time Reds >>>> Memory MsgQ Current Function >>>> ------------------------------------------------------------------------------------------------------------------------------- >>>> <6175.29048.3> proc_lib:init_p/5 '-' 462231 >>>> 51356760 0 mochijson2:json_bin_is_safe/1 >>>> <6175.12281.6> proc_lib:init_p/5 '-' 307183 >>>> 64195856 1 gen_fsm:loop/7 >>>> <6175.1581.5> proc_lib:init_p/5 '-' 286143 >>>> 41085600 0 mochijson2:json_bin_is_safe/1 >>>> <6175.6659.0> proc_lib:init_p/5 '-' 281845 >>>> 13752 0 sext:decode_binary/3 >>>> <6175.6666.0> proc_lib:init_p/5 '-' 209113 >>>> 21648 0 sext:decode_binary/3 >>>> <6175.12219.6> proc_lib:init_p/5 '-' 168832 >>>> 16829200 0 riak_client:wait_for_query_results/4 >>>> <6175.8403.0> proc_lib:init_p/5 '-' 133333 >>>> 13880 1 eleveldb:iterator_move/2 >>>> <6175.8813.0> proc_lib:init_p/5 '-' 119548 >>>> 9000 1 eleveldb:iterator/3 >>>> <6175.8411.0> proc_lib:init_p/5 '-' 115759 >>>> 34472 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-' >>>> <6175.5679.0> proc_lib:init_p/5 '-' 109577 >>>> 8952 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-' >>>> Output server crashed: connection_lost >>>> >>>> Based on that, is there anything anyone can think to do to try to bring >>>> performance back in to the land of usability? Does this thing appear to be >>>> something that may have been resolved in 1.4.6 or 1.4.7? >>>> >>>> Only thing we can think of at this point might be to remove or force >>>> remove the member and join in a new freshly built one, but last time we >>>> attempted that (on a different cluster) our secondary indexes got >>>> irreparably damaged and only regained consistency when we copied every >>>> individual key to (this) new cluster! Not a good experience :( but i’m >>>> hopeful that 1.4.6 may have addressed some of our issues. >>>> >>>> Any help is appreciated. >>>> >>>> Thank you, >>>> Sean McKibben >>>> >>>> >>>> _______________________________________________ >>>> riak-users mailing list >>>> riak-users@lists.basho.com >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com