Raviraj, Please run 'riak-debug'. This is in the bin directory along with 'riak start' and 'riak-admin'.
riak-debug will produce a file named similar to /home/user/r...@10.0.0.15-riak-debug.tar.gz <mailto:home/user/r...@10.0.0.15-riak-debug.tar.gz> You should email that file to me directly, or post it to dropbox or similar and send me a link. You do not want to send that file to the entire mailing list. I will review the file and suggest next steps. Matthew > On Feb 22, 2016, at 5:13 AM, Raviraj Vaishampayan <rvaishampa...@vmware.com> > wrote: > > Hi, > > We have been using riak to gather our test data and analyze results after > test completes. > Recently we have observed riak crash in riak console logs. > This causes our tests failing to record data to riak and bailing out :-( > > The crash logs are as follow: > 2016-02-19 16:25:26.255 [error] <0.2160.0> gen_fsm <0.2160.0> in state active > terminated with reason: no function clause matching > riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, > {state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...}) > line 1195 > 2016-02-19 16:25:26.260 [error] <0.2160.0> CRASH REPORT Process <0.2160.0> > with 2 neighbours exited with reason: no function clause matching > riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, > {state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...}) > line 1195 in gen_fsm:terminate/7 line 622 > 2016-02-19 16:25:26.260 [error] <0.172.0> Supervisor riak_core_vnode_sup had > child undefined started with {riak_core_vnode,start_link,undefined} at > <0.2160.0> exit with reason no function clause matching > riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, > {state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...}) > line 1195 in context child_terminated > 2016-02-19 16:25:26.261 [error] <0.4319.0> gen_fsm <0.4319.0> in state ready > terminated with reason: no function clause matching > riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, > {state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...}) > line 1195 > 2016-02-19 16:25:26.275 [error] <0.4319.0> CRASH REPORT Process <0.4319.0> > with 10 neighbours exited with reason: no function clause matching > riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, > {state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...}) > line 1195 in gen_fsm:terminate/7 line 622 > 2016-02-19 16:25:26.278 [error] <0.4320.0> Supervisor > {<0.4320.0>,poolboy_sup} had child riak_core_vnode_worker started with > riak_core_vnode_worker:start_link([{worker_module,riak_core_vnode_worker},{worker_args,[268322566228720457638957762256505085639956365312,...]},...]) > at undefined exit with reason no function clause matching > riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, > {state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...}) > line 1195 in context shutdown_error > 2016-02-19 16:25:26.278 [error] <0.4320.0> gen_server <0.4320.0> terminated > with reason: no function clause matching > riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, > {state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...}) > line 1195 > 2016-02-19 16:25:26.278 [error] <0.4320.0> CRASH REPORT Process <0.4320.0> > with 0 neighbours exited with reason: no function clause matching > riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, > {state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...}) > line 1195 in gen_server:terminate/6 line 744 > 2016-02-19 16:25:26.806 [error] <0.2157.0> gen_fsm <0.2157.0> in state active > terminated with reason: {timeout,{gen_server,call,[<0.5141.0>,stop]}} > 2016-02-19 16:25:26.808 [error] <0.2157.0> CRASH REPORT Process <0.2157.0> > with 2 neighbours exited with reason: > {timeout,{gen_server,call,[<0.5141.0>,stop]}} in gen_fsm:terminate/7 line 600 > 2016-02-19 16:25:26.809 [error] <0.5450.0> gen_fsm <0.5450.0> in state ready > terminated with reason: {timeout,{gen_server,call,[<0.5141.0>,stop]}} > 2016-02-19 16:25:26.809 [error] <0.172.0> Supervisor riak_core_vnode_sup had > child undefined started with {riak_core_vnode,start_link,undefined} at > <0.2157.0> exit with reason {timeout,{gen_server,call,[<0.5141.0>,stop]}} in > context child_terminated > 2016-02-19 16:25:26.809 [error] <0.5450.0> CRASH REPORT Process <0.5450.0> > with 10 neighbours exited with reason: > {timeout,{gen_server,call,[<0.5141.0>,stop]}} in gen_fsm:terminate/7 line 622 > 2016-02-19 16:25:26.809 [error] <0.5451.0> Supervisor > {<0.5451.0>,poolboy_sup} had child riak_core_vnode_worker started with > riak_core_vnode_worker:start_link([{worker_module,riak_core_vnode_worker},{worker_args,[211232658520482062396626323478525280184646500352,...]},...]) > at undefined exit with reason {timeout,{gen_server,call,[<0.5141.0>,stop]}} > in context shutdown_error > 2016-02-19 16:25:26.809 [error] <0.5451.0> gen_server <0.5451.0> terminated > with reason: {timeout,{gen_server,call,[<0.5141.0>,stop]}} > 2016-02-19 16:25:26.809 [error] <0.5451.0> CRASH REPORT Process <0.5451.0> > with 0 neighbours exited with reason: > {timeout,{gen_server,call,[<0.5141.0>,stop]}} in gen_server:terminate/6 line > 744 > > Our setup is as follow: > We have a riak cluster with 10 nodes, configuration of each node is as follow: > RAM: 48GB > Disk: > 80GB (/) > 504GB (separate riak partition) > Riak Version: 2.1.3-1 (2.1.3) > Data in riak: After observing crash, total data in riak partition was ~50GB > > Riak config is as follow: > riak.conf > [Attached with this email] > > advanced.config: > [ > {riak_kv, [{add_paths, ["/usr/local/lib/scale_riak/ebin"]}]}, > {webmachine, [{backlog, 511}, {nodelay, true}]}, > {yokozuna, [{solr_request_timeout, 120000}]} > ]. > > We have observed this a few times now, and after this crash we observed > latency increases and our application starts timing out. > We would really like to understand what might be causing this crash and if it > is something due to missing config on our nodes we would like to fix it. > > Thanks for your help in advanced :-) > > Regards, > Raviraj > <riak.conf>_______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com