Sargun,

Regarding 1) - AAE is disabled. We had a problems with it and there's a lot
of threads here in the mailing list regarding this. AAE won't stop using
more and more disk space and the only solution was disabling it! Since then
the cluster has been pretty stable...

Regarding 6) Can you or anyone in basho confirm that there won't be any
problems using the latest (1.4.12) version of riak in the new nodes and
only upgrading the old ones after this process is completed?

Thanks a lot for the other tips, you've been very helpful!

Best regards,
Edgar

On 24 January 2015 at 21:09, Sargun Dhillon <sar...@sargun.me> wrote:

> Several things:
> 1) If you have data at rest that doesn't change, make sure you have
> AAE, and it's ran before your cluster is manipulated. Given that
> you're running at 85% space, I would be a little worried to turn it
> on, because you might run out of disk space. You can also pretty
> reasonably put the AAE trees on magnetic storage. AAE is nice in the
> sense that you _know_ your cluster is consistent at a point in time.
>
> 2) Make sure you're getting SSDs of roughly the same quality. I've
> seen enterprise SSDs get higher and higher latency as time goes on,
> due to greater data protection features. We don't need any of that.
> Basho_bench is your friend if you have the time.
>
> 3) Do it all in one go. This will enable handoffs more cleanly, and all at
> once.
>
> 4) Do not add the new nodes to the load balancer until handoff is
> done. At least experimentally, latency increases slightly on the
> original cluster, but the target nodes have pretty awful latency.
>
> 5) Start with a handoff_limit of 1. You can easily raise this. If
> things look good, you can increase it. We're not optimizing for the
> total time to handoff, we really should be optimizing for individual
> vnode handoff time.
>
> 6) If you're using Leveldb, upgrade to the most recent version of Riak
> 1.4. There have been some improvements. 1.4.9 made me happier. I think
> it's reasonable for the new nodes to start on 1.4.12, and the old
> nodes to be switched over later.
>
> 7) Watch your network utilization. Keep your disk latency flat. Stop
> it if it spikes. Start from enabling one node with the lowest usage
> and see if it works.
>
>
> These are the things I can think of immediately.
>
> On Sat, Jan 24, 2015 at 12:42 PM, Alexander Sicular <sicul...@gmail.com>
> wrote:
> > I would probably add them all in one go so you have one vnode migration
> plan that gets executed. What is your ring size? How much data are we
> talking about? It's not necessarily the number of keys but rather the total
> amount of data and how quickly that data can move en mass between machines.
> >
> > -Alexander
> >
> >
> > @siculars
> > http://siculars.posthaven.com
> >
> > Sent from my iRotaryPhone
> >
> >> On Jan 24, 2015, at 15:37, Ed <edgarmve...@gmail.com> wrote:
> >>
> >> Hi everyone!
> >>
> >> I have a riak cluster, working in production for about one year, with
> the following characteristics:
> >> - Version 1.4.8
> >> - 6 nodes
> >> - leveldb backend
> >> - replication (n) = 3
> >> ~ 3 billion keys
> >>
> >> My ssd's are reaching 85% of capacity and we have decided to buy 6 more
> nodes to expand the cluster.
> >>
> >> Have you got any kind of advice on executing this operation or should I
> just follow the documentation on adding new nodes to a cluster?
> >>
> >> Best regards!
> >> Edgar
> >>
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users@lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to