Tom,

Basho prides itself on quickly responding to all user queries.  I have failed 
that tradition in this case.  Please accept my apologies.


The LOG data suggests leveldb is not stalling, especially not for 4 hours.  
Therefore the problem is related to disk utilization.

You appear to have large values.  I see .sst files where the average value is 
100K to 1Mbyte in size.  Is this intentional, or might you have a sibling 
problem?


My assessment is that your lower levels are full and therefore cascading 
regularly.  "cascading" is like the typical champagne glass pyramid you see at 
weddings.  Once all the glasses are full, new champagne at the top causes each 
subsequent layer to overflow into the one below that.  You have the same 
problem, but with data.  

Your large values have filled each of the lower levels and regularly cause 
cascading data between multiple levels.  The cascading is causing each 100K 
value write to become the equivalent of a 300K or 500K value as levels 
overflow.  This cascading is chewing up your hard disk performance (by reducing 
the amount of time the hard drive has available for read requests).

The leveldb code for Riak 2.0 has increased the size of all the levels.  The 
table of sizes is found at the top of leveldb's db/version_set.cc.  You could 
patch your current code if desired with this table from 2.0:

{                                                                               
                                                  
    {10485760,  262144000,  57671680,      209715200,                 0,     
420000000, true},                                   
    {10485760,   82914560,  57671680,      419430400,                 0,     
209715200, true},                                   
    {10485760,  314572800,  57671680,     3082813440,         200000000,     
314572800, false},                                   
    {10485760,  419430400,  57671680,     6442450944ULL,     4294967296ULL,  
419430400, false},                                   
    {10485760,  524288000,  57671680,   128849018880ULL,    85899345920ULL,  
524288000, false},                                   
    {10485760,  629145600,  57671680,  2576980377600ULL,  1717986918400ULL,  
629145600, false},                                   
    {10485760,  734003200,  57671680, 51539607552000ULL, 34359738368000ULL,  
734003200, false}                                   
};                                                                              
                                  

You cannot take the entire 2.0 leveldb into your 1.4 code base due to various 
option changes.


Let me know if this helps.  I have previously hypothesized that "grooming" 
compactions should be limited to one thread total.  However my test datasets 
never demonstrated a benefit.  Your dataset might be the case that proves the 
benefit.  I will go find the grooming patch to hot_threads for you if the above 
table proves insufficient.

Matthew




On Jul 2, 2014, at 9:20 PM, Tom Lanyon <tom+r...@oneshoeco.com> wrote:

> Hi Matthew, 
> 
> Just thought I'd see whether you were back from your travels and had had a 
> chance to take a look at the log file provided?
> 
> There's no rush if you haven't had a chance!
> 
> Regards,
> Tom
> 
> 
> On Tuesday, 24 June 2014 at 10:45, Tom Lanyon wrote:
> 
>> No problem, Matthew. 
>> 
>> Appreciate you taking a look when you have time.
>> 
>> Regards,
>> Tom
>> 
>> 
>> On Tuesday, 24 June 2014 at 9:45, Matthew Von-Maszewski wrote:
>> 
>>> Tom,
>>> 
>>> I have been distracted today and on a plane tomorrow. I apologize for the 
>>> delayed response. It may be late tomorrow before I can share further 
>>> thoughts. 
>>> 
>>> Again my apologies.
>>> 
>>> Matthew Von-Maszewski
>>> 
>>> 
>>> On Jun 23, 2014, at 8:58, Tom Lanyon <tom+r...@oneshoeco.com 
>>> (mailto:tom+r...@oneshoeco.com)> wrote:
>>> 
>>>> Thanks; the combined_log for our Riak node 3 is here:
>>>> 
>>>> https://www.dropbox.com/s/krhhwnplpeyhl0c/riak3-combined_log-20140623.log.gz
>>>> 
>>>> Let me know if you can't retrieve/view it.
>>>> 
>>>> With timestamps relative to this log file, at 2014/06/23-05:35 our 
>>>> monitoring detected node3's Riak as "down"; it wasn't serving any client 
>>>> protobuf requests, "riak ping" didn't respond and all of the other nodes 
>>>> marked node 3 as unreachable. We watched the process and it was busy doing 
>>>> leveldb compactions so we left it alone and it eventually recovered at 
>>>> 2014/06/23-09:32 (so ~4 hours unresponsive).
>>>> 
>>>> Yes - this cluster started at 1.2.1 and then I believe it went to 1.3.1, 
>>>> 1.4.2 and now 1.4.8. However, we went from 1.3.1-->1.4.2 in September 2013 
>>>> and 1.4.2-->1.4.8 in May, so we've been running 1.4.x for many months - 
>>>> does this fit with the 'one time cost of upgrading' you mentioned?
>>>> 
>>>> Regards,
>>>> Tom
>>>> 
>>>> 
>>>> On Monday, 23 June 2014 at 19:29, Matthew Von-Maszewski wrote:
>>>> 
>>>>> Yes, off list is fine for the data files. I may or may not respond via 
>>>>> the list depending upon what I find.
>>>>> 
>>>>> I did recall a case where leveldb seems unresponsive for hours. This case 
>>>>> was a one time cost of upgrading some 1.2 or 1.3 systems to 1.4. Would 
>>>>> that happen to describe your scenario?
>>>>> 
>>>>> Matthew Von-Maszewski
>>>>> 
>>>>> 
>>>>> On Jun 23, 2014, at 0:28, Tom Lanyon <tom+r...@oneshoeco.com 
>>>>> (mailto:tom+r...@oneshoeco.com)> wrote:
>>>>> 
>>>>>> Hi Matthew, 
>>>>>> 
>>>>>> Thanks for the response and apologies for my off-list reply.
>>>>>> 
>>>>>> I can send a combined_log example directly to you if that helps? It's 
>>>>>> 13MB gzip'ed.
>>>>>> 
>>>>>> Regards,
>>>>>> Tom
>>>>>> 
>>>>>> 
>>>>>> On Monday, 23 June 2014 at 12:30, Matthew Von-Maszewski wrote:
>>>>>> 
>>>>>>> Hot threads is included with 1.4.9. The leveldb source file 
>>>>>>> leveldb/util//hot_threads.cc (http://hot_threads.cc 
>>>>>>> (http://_threads.cc) (http://_threads.cc) (http://_threads.cc)) is the 
>>>>>>> key file.
>>>>>>> 
>>>>>>> The code helps throughput, but is not magical. "unresponsive for hours" 
>>>>>>> is not a known problem in the 1.4.x code base. Would you mind posting 
>>>>>>> an aggregate LOG file from a period when this happens?
>>>>>>> 
>>>>>>> sort /var/lib/riak/*/LOG >combined_log
>>>>>>> 
>>>>>>> Substitute your actual data path for /var/lib/riak.
>>>>>>> 
>>>>>>> Matthew Von-Maszewski
>>>>>>> 
>>>>>>> 
>>>>>>> On Jun 22, 2014, at 22:07, Tom Lanyon <tom+r...@oneshoeco.com 
>>>>>>> (mailto:tom+r...@oneshoeco.com)> wrote:
>>>>>>> 
>>>>>>>> Could someone please confirm whether 1.4.9 includes "Hot Threads" in 
>>>>>>>> leveldb? 
>>>>>>>> 
>>>>>>>> The release notes have a link to it, but I couldn't find my way 
>>>>>>>> through the rebar & git maze to be absolutely sure it is in 1.4.9 but 
>>>>>>>> not 1.4.8.
>>>>>>>> 
>>>>>>>> We're seeing nodes unresponsive for hours during large compactions and 
>>>>>>>> wondered if this leveldb improvement would help.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Tom
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> riak-users mailing list
>>>>>>>> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> 

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to