Hi, Jon thank you for the answer. During approval of my mail to this list I've troubleshoot my issue more deep. And yes, your are right. Neither {error, enotconn} nor max_concurrency is my problem.
I'm going to migrate my cluster entierly to eleveldb only, i.e. I need to refuse using bitcask. I have a talk with basho support and they said that it is tricky to tune bitcask on servers with 32 GB RAM (and I guess that it is not tricky, but it is impossible, because bitcask loads all keys in memory regardless of free available RAM). With LevelDB I have opportunity to tune using RAM on servers. So I have 15 nodes with multibackend (bitcask for data and leveldb for metadata). 2 additional servers are without multibackend - only with leveldb. Now I'm not sure do I need still use mutibackend with levedb-only backend. And my problem is (as I mentioned earlier) the following. On leveldb-only nodes I see handoffs timedout and no further progress. On multibackend hosts I have configuration: {riak_kv, [ {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-1.5.0/ebin"]}, {storage_backend, riak_cs_kv_multi_backend}, {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]}, {multi_backend_default, be_default}, {multi_backend, [ {be_default, riak_kv_eleveldb_backend, [ {max_open_files, 50}, {data_root, "/var/lib/riak/leveldb"} ]}, {be_blocks, riak_kv_bitcask_backend, [ {data_root, "/var/lib/riak/bitcask"} ]} ]}, And for hosts with leveldb-only backend: {riak_kv, [ {storage_backend, riak_kv_eleveldb_backend}, ... {eleveldb, [ {data_root, "/var/lib/riak/leveldb"} (default values for leveldb) In leveldb logs I see nothing that could help me (no errors in logs). On Mon, Oct 26, 2015 at 3:57 PM Jon Meredith <jmered...@basho.com> wrote: > Hi, > > I suspect your {error,enotconn} messages are unrelated - that's likely to > be caused by an HTTP client closing the connection while Riak looks up > some networking information about the requestor. > > The max_concurrency message you are seeing is related to the handoff > transfer limit - it should be labelled as informational. When a node has > data to handoff it starts the handoff sender process and if there are > either too many local handoff processes or too many on the remote side it > exits with max_concurrency. You could increase with riak-admin > transfer-limit but that probably won't help if you're timing out. > > As you're using the multi-backend you're transferring data from bitcask > and leveldb. The next place I would look is in the leveldb LOG files to > see if there are any leveldb vnodes that are having problems that's > preventing repair. > > Jon > > On Mon, Oct 26, 2015 at 7:15 AM Vladyslav Zakhozhai < > v.zakhoz...@smartweb.com.ua> wrote: > >> Hello, >> >> I have a problem with persistent timeouts during ownership handoffs. I've >> tried to surf over Internet and current mail list but no success. >> >> I have Riak 1.4.12 cluster with 17 nodes. Almost all nodes use >> multibackend with bitcask and eleveldb as storage backends (we need >> multiple backend for Riak CS 1.5.0 integration). >> >> Now I'm working to migrate Riak cluster to eleveldb as primary and only >> backend. For now I have 2 nodes with eleveldb backend in the same cluster. >> >> During ownership handoff process I permanently see errors of timed out >> handoff receivers and sender. >> >> Here is partial output of riak-admin transfers: >> ... >> transfer type: ownership_transfer >> vnode type: riak_kv_vnode >> partition: 331121464707782692405522344912282871640797216768 >> started: 2015-10-21 08:32:55 [46.66 min ago] >> last update: no updates seen >> total size: unknown >> objects transferred: unknown >> >> unknown >> riak@taipan.pleiad.uaprom =======> r...@eggeater.pleiad.uapr >> om >> | | 0% >> unknown >> >> transfer type: ownership_transfer >> vnode type: riak_kv_vnode >> partition: 336830455478606531929755488790080852186328203264 >> started: 2015-10-21 08:32:54 [46.68 min ago] >> last update: no updates seen >> total size: unknown >> objects transferred: unknown >> ... >> >> Some of partition handoffs state never updates, some of them terminates >> after partial handoff objects and never starts again. >> >> I see nothing in logs but following: >> >> On receiver side: >> >> 2015-10-21 11:33:55.131 [error] >> <0.25390.1266>@riak_core_handoff_receiver:handle_info:105 Handoff receiver >> for partition 331121464707782692405522344912282871640797216768 timed out >> after processing 0 objects. >> >> On sender side: >> >> 2015-10-21 11:01:58.879 [error] <0.13177.1401> CRASH REPORT Process >> <0.13177.1401> with 0 neighbours crashed with reason: no function clause >> matching webmachine_request:peer_from_peername({error,enotconn}, >> {webmachine_request,{wm_reqstate,#Port<0.50978116>,[],undefined,undefined,undefined,{wm_reqdata,...},...}}) >> line 150 >> 2015-10-21 11:32:50.055 [error] <0.207.0> Supervisor >> riak_core_handoff_sender_sup had child riak_core_handoff_sender started >> with {riak_core_handoff_sender,start_link,undefined} at <0.22312.1090> exit >> with reason max_concurrency in context child_terminated >> >> {error, enotconn} - seems to be network issue. But I have no any problems >> with network. All hosts resolve their neighbors correctly and /etc/hosts on >> each node are correct. >> >> I've tried to increase handoff_timeout and handoff_receive_timeout. But >> no success. >> >> Forcing handoff helped me but for short period of time: >> >> rpc:multicall([node() | nodes()], riak_core_vnode_manager, force_handoffs, >> []). >> >> >> I see progress of handoffs (riak-admin transfers) but then I see handoff >> timed out again. >> >> >> A week ago I've joined 4 nodes with bitcask. And there was no such problems. >> >> >> I'm confused a little bit and need to understand my next steps in >> troubleshooting this issue. >> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com