Tom, Basho prides itself on quickly responding to all user queries. I have failed that tradition in this case. Please accept my apologies.
The LOG data suggests leveldb is not stalling, especially not for 4 hours. Therefore the problem is related to disk utilization. You appear to have large values. I see .sst files where the average value is 100K to 1Mbyte in size. Is this intentional, or might you have a sibling problem? My assessment is that your lower levels are full and therefore cascading regularly. "cascading" is like the typical champagne glass pyramid you see at weddings. Once all the glasses are full, new champagne at the top causes each subsequent layer to overflow into the one below that. You have the same problem, but with data. Your large values have filled each of the lower levels and regularly cause cascading data between multiple levels. The cascading is causing each 100K value write to become the equivalent of a 300K or 500K value as levels overflow. This cascading is chewing up your hard disk performance (by reducing the amount of time the hard drive has available for read requests). The leveldb code for Riak 2.0 has increased the size of all the levels. The table of sizes is found at the top of leveldb's db/version_set.cc. You could patch your current code if desired with this table from 2.0: { {10485760, 262144000, 57671680, 209715200, 0, 420000000, true}, {10485760, 82914560, 57671680, 419430400, 0, 209715200, true}, {10485760, 314572800, 57671680, 3082813440, 200000000, 314572800, false}, {10485760, 419430400, 57671680, 6442450944ULL, 4294967296ULL, 419430400, false}, {10485760, 524288000, 57671680, 128849018880ULL, 85899345920ULL, 524288000, false}, {10485760, 629145600, 57671680, 2576980377600ULL, 1717986918400ULL, 629145600, false}, {10485760, 734003200, 57671680, 51539607552000ULL, 34359738368000ULL, 734003200, false} }; You cannot take the entire 2.0 leveldb into your 1.4 code base due to various option changes. Let me know if this helps. I have previously hypothesized that "grooming" compactions should be limited to one thread total. However my test datasets never demonstrated a benefit. Your dataset might be the case that proves the benefit. I will go find the grooming patch to hot_threads for you if the above table proves insufficient. Matthew On Jul 2, 2014, at 9:20 PM, Tom Lanyon <tom+r...@oneshoeco.com> wrote: > Hi Matthew, > > Just thought I'd see whether you were back from your travels and had had a > chance to take a look at the log file provided? > > There's no rush if you haven't had a chance! > > Regards, > Tom > > > On Tuesday, 24 June 2014 at 10:45, Tom Lanyon wrote: > >> No problem, Matthew. >> >> Appreciate you taking a look when you have time. >> >> Regards, >> Tom >> >> >> On Tuesday, 24 June 2014 at 9:45, Matthew Von-Maszewski wrote: >> >>> Tom, >>> >>> I have been distracted today and on a plane tomorrow. I apologize for the >>> delayed response. It may be late tomorrow before I can share further >>> thoughts. >>> >>> Again my apologies. >>> >>> Matthew Von-Maszewski >>> >>> >>> On Jun 23, 2014, at 8:58, Tom Lanyon <tom+r...@oneshoeco.com >>> (mailto:tom+r...@oneshoeco.com)> wrote: >>> >>>> Thanks; the combined_log for our Riak node 3 is here: >>>> >>>> https://www.dropbox.com/s/krhhwnplpeyhl0c/riak3-combined_log-20140623.log.gz >>>> >>>> Let me know if you can't retrieve/view it. >>>> >>>> With timestamps relative to this log file, at 2014/06/23-05:35 our >>>> monitoring detected node3's Riak as "down"; it wasn't serving any client >>>> protobuf requests, "riak ping" didn't respond and all of the other nodes >>>> marked node 3 as unreachable. We watched the process and it was busy doing >>>> leveldb compactions so we left it alone and it eventually recovered at >>>> 2014/06/23-09:32 (so ~4 hours unresponsive). >>>> >>>> Yes - this cluster started at 1.2.1 and then I believe it went to 1.3.1, >>>> 1.4.2 and now 1.4.8. However, we went from 1.3.1-->1.4.2 in September 2013 >>>> and 1.4.2-->1.4.8 in May, so we've been running 1.4.x for many months - >>>> does this fit with the 'one time cost of upgrading' you mentioned? >>>> >>>> Regards, >>>> Tom >>>> >>>> >>>> On Monday, 23 June 2014 at 19:29, Matthew Von-Maszewski wrote: >>>> >>>>> Yes, off list is fine for the data files. I may or may not respond via >>>>> the list depending upon what I find. >>>>> >>>>> I did recall a case where leveldb seems unresponsive for hours. This case >>>>> was a one time cost of upgrading some 1.2 or 1.3 systems to 1.4. Would >>>>> that happen to describe your scenario? >>>>> >>>>> Matthew Von-Maszewski >>>>> >>>>> >>>>> On Jun 23, 2014, at 0:28, Tom Lanyon <tom+r...@oneshoeco.com >>>>> (mailto:tom+r...@oneshoeco.com)> wrote: >>>>> >>>>>> Hi Matthew, >>>>>> >>>>>> Thanks for the response and apologies for my off-list reply. >>>>>> >>>>>> I can send a combined_log example directly to you if that helps? It's >>>>>> 13MB gzip'ed. >>>>>> >>>>>> Regards, >>>>>> Tom >>>>>> >>>>>> >>>>>> On Monday, 23 June 2014 at 12:30, Matthew Von-Maszewski wrote: >>>>>> >>>>>>> Hot threads is included with 1.4.9. The leveldb source file >>>>>>> leveldb/util//hot_threads.cc (http://hot_threads.cc >>>>>>> (http://_threads.cc) (http://_threads.cc) (http://_threads.cc)) is the >>>>>>> key file. >>>>>>> >>>>>>> The code helps throughput, but is not magical. "unresponsive for hours" >>>>>>> is not a known problem in the 1.4.x code base. Would you mind posting >>>>>>> an aggregate LOG file from a period when this happens? >>>>>>> >>>>>>> sort /var/lib/riak/*/LOG >combined_log >>>>>>> >>>>>>> Substitute your actual data path for /var/lib/riak. >>>>>>> >>>>>>> Matthew Von-Maszewski >>>>>>> >>>>>>> >>>>>>> On Jun 22, 2014, at 22:07, Tom Lanyon <tom+r...@oneshoeco.com >>>>>>> (mailto:tom+r...@oneshoeco.com)> wrote: >>>>>>> >>>>>>>> Could someone please confirm whether 1.4.9 includes "Hot Threads" in >>>>>>>> leveldb? >>>>>>>> >>>>>>>> The release notes have a link to it, but I couldn't find my way >>>>>>>> through the rebar & git maze to be absolutely sure it is in 1.4.9 but >>>>>>>> not 1.4.8. >>>>>>>> >>>>>>>> We're seeing nodes unresponsive for hours during large compactions and >>>>>>>> wondered if this leveldb improvement would help. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Tom >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> riak-users mailing list >>>>>>>> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com) >>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>>> >>>>>> >>>>> >>>> >>> >> > > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com