Re: Single node causing cluster to be extremely slow (leveldb)

Sean McKibben Fri, 10 Jan 2014 09:00:25 -0800

Excellent and informative explanation, thank you very much. We’re very happy 
that our adjustments have returned the cluster to its normal operating 
parameters. Also glad that Riak 2 will be handling this stuff programmatically, 
as prior to your spreadsheet and explanation it was pure voodoo for us. I think 
the automation will significantly decrease the number of animal sacrifices 
needed to appease the levelDB gods! :)


Sean McKibben


On Jan 10, 2014, at 9:18 AM, Matthew Von-Maszewski <matth...@basho.com> wrote:

> Attached is the spreadsheet I used for deriving the cache_size and 
> max_open_files.  The general guidelines of the spreadsheet are:
> 
> vnode count:  ring size divided by (number of nodes minus one)
> write_buf_min/max:  don't touch … you will screw up my leveldb tuning
> cache_size:  8Mbytes is hard minimum
> max_open_files:  this is NOT a file count in 1.4.  It is 4Mbytes times the 
> value.  File cache is meta-data size based, not file count.
> 
> lower cache_size and raise max_open_files as necessary to keep "remaining" 
> close to zero AND cover your total file metadata size
> 
> What is file metadata size? I looked at one vnode's LOG file for rough 
> estimates:
> 
> - Your total file count was 1,479 in one vnode
> - You typically hit the 75,000 key limit
> - Key count (75,000) divided into a typical file size is 496 bytes … used 496 
> as average value size
> - Block_size is 4096.  496 value size goes into block size about 10 times (no 
> need for fractions since block_size is a threshold, not fixed value)
> - 75,000 total keys in file, 10 keys per block … that means 7,500 keys in 
> file's index … 100 bytes per key is 750,000 bytes of keys in index.
> - bloom filter is 2 bytes per key (all 75,000 keys) or 150,00 bytes
> - metadata loaded into file cache is therefore 750,000 + 150,000 bytes per 
> file or 900,000 bytes.
> - 900,000 bytes per file times 1,479 files is 1,331,100,000 bytes of file 
> cache needed …
> 
> Your original 315 max_open_files is 1,279,262,720 in size (315 * 4Mbytes) … 
> file cache is thrashing since 1,279,262,720 is less than 1,331,100,000.
> 
> I told you 425 as a max_open_files setting, spreadsheet has 400 as more 
> conservative number.
> 
> Matthew
> 
> 
> <leveldb_sizing_1.4.push.xls>
> On Jan 10, 2014, at 9:41 AM, Martin May <mar...@push.io> wrote:
> 
>> Hi Matthew,
>> 
>> We applied this change to node 4, started it up, and it seems much happier 
>> (no crazy CPU). We’re going to keep an eye on it for a little while, and 
>> then apply this setting to all the other nodes as well.
>> 
>> Is there anything we can do to prevent this scenario in the future, or 
>> should the settings you suggested take care of that?
>> 
>> Thanks,
>> Martin
>> 
>> On Jan 10, 2014, at 6:42 AM, Matthew Von-Maszewski <matth...@basho.com> 
>> wrote:
>> 
>>> Sean,
>>> 
>>> I did some math based upon the app.config and LOG files.  I am guessing 
>>> that you are starting to thrash your file cache.
>>> 
>>> This theory should be easy to prove / disprove.  On that one node, change 
>>> the cache_size and max_open_files to:
>>> 
>>> cache_size 68435456
>>> max_open_files 425
>>> 
>>> If I am correct, the node should come up and not cause problems.  We are 
>>> trading block cache space for file cache space.  A miss in the file cache 
>>> is far more costly than a miss in the block cache.
>>> 
>>> Let me know how this works for you.  It is possible that we might want to 
>>> talk about raising your block size slightly to reduce file cache overhead.
>>> 
>>> Matthew
>>> 
>>> On Jan 9, 2014, at 9:33 PM, Sean McKibben <grap...@graphex.com> wrote:
>>> 
>>>> We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon 
>>>> it started responding extremely slowly. CPU on member 4 was extremely high 
>>>> and we restarted that process, but it didn’t help. We temporarily shut 
>>>> down member 4 and cluster speed returned to normal, but as soon as we boot 
>>>> member 4 back up, the cluster performance goes to shit.
>>>> 
>>>> We’ve run in to this before but were able to just start with a fresh set 
>>>> of data after wiping machines as it was before we migrated to this 
>>>> bare-metal cluster. Now it is causing some pretty significant issues and 
>>>> we’re not sure what we can do to get it back to normal, many of our queues 
>>>> are filling up and we’ll probably have to take node 4 off again just so we 
>>>> can provide a regular quality of service.
>>>> 
>>>> We’ve turned off AAE on node 4 but it hasn’t helped. We have some 
>>>> transfers that need to happen but they are going very slowly.
>>>> 
>>>> 'riak-admin top’ on node 4 reports this:
>>>> Load:  cpu       610               Memory:  total      503852    binary    
>>>>  231544
>>>>     procs     804                        processes  179850    code        
>>>> 11588
>>>>     runq      134                        atom          533    ets          
>>>> 4581
>>>> 
>>>> Pid                 Name or Initial Func         Time       Reds     
>>>> Memory       MsgQ Current Function
>>>> -------------------------------------------------------------------------------------------------------------------------------
>>>> <6175.29048.3>      proc_lib:init_p/5             '-'     462231   
>>>> 51356760          0 mochijson2:json_bin_is_safe/1
>>>> <6175.12281.6>      proc_lib:init_p/5             '-'     307183   
>>>> 64195856          1 gen_fsm:loop/7
>>>> <6175.1581.5>       proc_lib:init_p/5             '-'     286143   
>>>> 41085600          0 mochijson2:json_bin_is_safe/1
>>>> <6175.6659.0>       proc_lib:init_p/5             '-'     281845      
>>>> 13752          0 sext:decode_binary/3
>>>> <6175.6666.0>       proc_lib:init_p/5             '-'     209113      
>>>> 21648          0 sext:decode_binary/3
>>>> <6175.12219.6>      proc_lib:init_p/5             '-'     168832   
>>>> 16829200          0 riak_client:wait_for_query_results/4
>>>> <6175.8403.0>       proc_lib:init_p/5             '-'     133333      
>>>> 13880          1 eleveldb:iterator_move/2
>>>> <6175.8813.0>       proc_lib:init_p/5             '-'     119548       
>>>> 9000          1 eleveldb:iterator/3
>>>> <6175.8411.0>       proc_lib:init_p/5             '-'     115759      
>>>> 34472          0 riak_kv_vnode:'-result_fun_ack/2-fun-0-'
>>>> <6175.5679.0>       proc_lib:init_p/5             '-'     109577       
>>>> 8952          0 riak_kv_vnode:'-result_fun_ack/2-fun-0-'
>>>> Output server crashed: connection_lost
>>>> 
>>>> Based on that, is there anything anyone can think to do to try to bring 
>>>> performance back in to the land of usability? Does this thing appear to be 
>>>> something that may have been resolved in 1.4.6 or 1.4.7?
>>>> 
>>>> Only thing we can think of at this point might be to remove or force 
>>>> remove the member and join in a new freshly built one, but last time we 
>>>> attempted that (on a different cluster) our secondary indexes got 
>>>> irreparably damaged and only regained consistency when we copied every 
>>>> individual key to (this) new cluster! Not a good experience :( but i’m 
>>>> hopeful that 1.4.6 may have addressed some of our issues.
>>>> 
>>>> Any help is appreciated.
>>>> 
>>>> Thank you,
>>>> Sean McKibben
>>>> 
>>>> 
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
> 


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Single node causing cluster to be extremely slow (leveldb)

Reply via email to