It would be helpful if you described the physical characteristics of the 
servers:  memory size, logical cpu count, etc.

Google created leveldb to be highly reliable in the face of crashes.  If it is 
not restarting, that suggests to me that you have a low memory condition that 
is not able to load leveldb's MANIFEST file.  That is easily fixed by moving 
the dataset to a machine with larger memory.

There is also a special flag to reduce Riak's leveldb memory foot print during 
development work.  The setting reduces the leveldb performance, but lets you 
run with less memory.

In riak.conf, set:

leveldb.limited_developer_mem = true

Matthew


> On Jul 12, 2016, at 11:56 AM, Vikram Lalit <vikramla...@gmail.com> wrote:
> 
> Hi - I've been testing a Riak cluster (of 3 nodes) with an ejabberd messaging 
> cluster in front of it that writes data to the Riak nodes. Whilst load 
> testing the platform (by creating 0.5 million ejabberd users via Tsung), I 
> found that the Riak nodes suddenly crashed. My question is how do we recover 
> from such a situation if it were to occur in production?
> 
> To provide further context / details, the leveldb log files storing the data 
> suddenly became too huge, thus making the AWS Riak instances not able to load 
> them in memory anymore. So we get a core dump if 'riak start' is fired on 
> those instances. I had an n_val = 2, and all 3 nodes went down almost 
> simultaneously, so in such a scenario, we cannot even rely on a 2nd copy of 
> the data. One way to of course prevent it in the first place would be to use 
> auto-scaling, but I'm wondering is there a ex post facto / post the event 
> recovery that can be performed in such a scenario? Is it possible to simply 
> copy the leveldb data to a larger memory instance, or to curtail the data 
> further to allow loading in the same instance?
> 
> Appreciate if you can provide inputs - a tad concerned as to how we could 
> recover from such a situation if it were to happen in production (apart from 
> leveraging auto-scaling as a preventive measure).
> 
> Thanks!
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to