Hi Matthew, see inline..
On Tue, 10 Dec 2013 10:38:03 -0500 Matthew Von-Maszewski <matth...@basho.com> wrote: > The sad truth is that you are not the first to see this problem. And yes, it > has to do with your 950GB per node dataset. And no, nothing to do but sit > through it at this time. > > While I did extensive testing around upgrade times before shipping 1.4, > apparently there are data configurations I did not anticipate. You are > likely seeing a cascade where a shift of one file from level-1 to level-2 is > causing a shift of another file from level-2 to level-3, which causes a > level-3 file to shift to level-4, etc … then the next file shifts from > level-1. > > The bright side of this pain is that you will end up with better write > throughput once all the compaction ends. I have to deal with that.. but my problem is now, if I'm doing this node by node it looks like 2i searches aren't possible while 1.3 and 1.4 nodes exists in the cluster. Is there any problem which leads me to an 2i repair marathon or could I easily wait for some hours for each node until all merges are done before I upgrade the next one? (2i searches can fail for some time.. the APP isn't having problems with that but are new inserts with 2i indices processed successfully or do I have to do the 2i repair?) /s one other good think: saving disk space is one advantage ;).. > > Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but that > is not going to help you today. > > Matthew > > On Dec 10, 2013, at 10:26 AM, Simon Effenberg <seffenb...@team.mobile.de> > wrote: > > > Hi @list, > > > > I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after > > upgrading the first node (out of 12) this node seems to do many merges. > > the sst_* directories changes in size "rapidly" and the node is having > > a disk utilization of 100% all the time. > > > > I know that there is something like that: > > > > "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset > > will initiate an automatic conversion that could pause the startup of > > each node by 3 to 7 minutes. The leveldb data in "level #1" is being > > adjusted such that "level #1" can operate as an overlapped data level > > instead of as a sorted data level. The conversion is simply the > > reduction of the number of files in "level #1" to being less than eight > > via normal compaction of data from "level #1" into "level #2". This is > > a one time conversion." > > > > but it looks much more invasive than explained here or doesn't have to > > do anything with the (probably seen) merges. > > > > Is this "normal" behavior or could I do anything about it? > > > > At the moment I'm stucked with the upgrade procedure because this high > > IO load would probably lead to high response times. > > > > Also we have a lot of data (per node ~950 GB). > > > > Cheers > > Simon > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > -- Simon Effenberg | Site Ops Engineer | mobile.international GmbH Fon: + 49-(0)30-8109 - 7173 Fax: + 49-(0)30-8109 - 7131 Mail: seffenb...@team.mobile.de Web: www.mobile.de Marktplatz 1 | 14532 Europarc Dreilinden | Germany Geschäftsführer: Malte Krüger HRB Nr.: 18517 P, Amtsgericht Potsdam Sitz der Gesellschaft: Kleinmachnow _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com