Hi Matthew,

see inline..

On Tue, 10 Dec 2013 10:38:03 -0500
Matthew Von-Maszewski <matth...@basho.com> wrote:

> The sad truth is that you are not the first to see this problem.  And yes, it 
> has to do with your 950GB per node dataset.  And no, nothing to do but sit 
> through it at this time.
> 
> While I did extensive testing around upgrade times before shipping 1.4, 
> apparently there are data configurations I did not anticipate.  You are 
> likely seeing a cascade where a shift of one file from level-1 to level-2 is 
> causing a shift of another file from level-2 to level-3, which causes a 
> level-3 file to shift to level-4, etc … then the next file shifts from 
> level-1.
> 
> The bright side of this pain is that you will end up with better write 
> throughput once all the compaction ends.

I have to deal with that.. but my problem is now, if I'm doing this
node by node it looks like 2i searches aren't possible while 1.3 and
1.4 nodes exists in the cluster. Is there any problem which leads me to
an 2i repair marathon or could I easily wait for some hours for each
node until all merges are done before I upgrade the next one? (2i
searches can fail for some time.. the APP isn't having problems with
that but are new inserts with 2i indices processed successfully or do
I have to do the 2i repair?)

/s

one other good think: saving disk space is one advantage ;)..


> 
> Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but that 
> is not going to help you today.
> 
> Matthew
> 
> On Dec 10, 2013, at 10:26 AM, Simon Effenberg <seffenb...@team.mobile.de> 
> wrote:
> 
> > Hi @list,
> > 
> > I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
> > upgrading the first node (out of 12) this node seems to do many merges.
> > the sst_* directories changes in size "rapidly" and the node is having
> > a disk utilization of 100% all the time.
> > 
> > I know that there is something like that:
> > 
> > "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
> > will initiate an automatic conversion that could pause the startup of
> > each node by 3 to 7 minutes. The leveldb data in "level #1" is being
> > adjusted such that "level #1" can operate as an overlapped data level
> > instead of as a sorted data level. The conversion is simply the
> > reduction of the number of files in "level #1" to being less than eight
> > via normal compaction of data from "level #1" into "level #2". This is
> > a one time conversion."
> > 
> > but it looks much more invasive than explained here or doesn't have to
> > do anything with the (probably seen) merges.
> > 
> > Is this "normal" behavior or could I do anything about it?
> > 
> > At the moment I'm stucked with the upgrade procedure because this high
> > IO load would probably lead to high response times.
> > 
> > Also we have a lot of data (per node ~950 GB).
> > 
> > Cheers
> > Simon
> > 
> > _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon:     + 49-(0)30-8109 - 7173
Fax:     + 49-(0)30-8109 - 7131

Mail:     seffenb...@team.mobile.de
Web:    www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow 

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to