Thanks Matthew. I will try one of those solutions
On Tue, Feb 14, 2017 at 3:51 PM, Matthew Von-Maszewski <matth...@basho.com> wrote: > Arun, > > You are running out of RAM for the leveldb AAE. There are several ways to > fix that: > > - reduce memory allocated to bitcask > - more memory per server > - more servers of same memory > - reduce the ring size from 64 to 8, and rebuild data within the cluster > from scratch > - lie to leveldb and give it a big than real memory setting in riak.conf: > leveldb.maximum_memory=8G > > > The key LOG lines are: > > Options.total_leveldb_mem: 2,901,766,963 <-- this is the total memory > assigned to ALL of leveldb, but > only 20% of it goes to AAE vnodes > > File cache size: 5833527 <-- the first vnode says, cool enough memory > for me > Block cache size: 7930679 <-- ditto > > ... but as more vnodes start: > > File cache size: 0 <-- things are just not going to work > well > Block cache size: 0 > > There are no actual file system error messages in your LOG files. That > supports that the real problem is memory unhappiness. > > Matthew > > > On Feb 14, 2017, at 3:34 PM, Arun Rajagopalan < > arun.v.rajagopa...@gmail.com> wrote: > > Hi Matthew, Magnus > > I have attached the log files for your review > > Thanks > Arun > > > On Tue, Feb 14, 2017 at 11:55 AM, Matthew Von-Maszewski < > matth...@basho.com> wrote: > >> Arun, >> >> The AAE code uses leveldb for its storage of anti-entropy data, no matter >> which backend holds the user data. Therefore the error below suggests >> corruption within leveldb files (which is not impossible, but becoming >> really rare except with bad hardware or full disks). >> >> Before wiping out the AAE directory, you should copy the LOG file within >> it. There are likely more useful error messages within that file ... maybe >> put the file in drop box or zip attach to a reply for us to review. >> >> Matthew >> >> On Feb 14, 2017, at 10:42 AM, Magnus Kessler <mkess...@basho.com> wrote: >> >> On 14 February 2017 at 14:46, Arun Rajagopalan < >> arun.v.rajagopa...@gmail.com> wrote: >> >>> Hi Magnus >>> >>> RIAK crashes on startup when I have trucated bitcask file >>> >>> It also crashes when the AAE files are bad too I think. Example below >>> >>> 2017-02-13 21:18:30 =CRASH REPORT==== >>> crasher: >>> initial call: riak_kv_index_hashtree:init/1 >>> pid: <0.6037.0> >>> registered_name: [] >>> exception exit: {{{badmatch,{error,{db_open,"Corruption: truncated >>> record at end of file"}}},[{hashtree,new_segment_ >>> store,2,[{file,"src/hashtree.erl"},{line,675}]},{hashtree,ne >>> w,2,[{file,"src/hashtree.erl"},{line,246}]},{riak_kv_index_h >>> ashtree,do_new_tree,3,[{file,"src/riak_kv_index_hashtree.erl >>> "},{line,610}]},{lists,foldl,3,[{file,"lists.erl"},{line,124 >>> 8}]},{riak_kv_index_hashtree,init_trees,3,[{file,"src/riak_k >>> v_index_hashtree.erl"},{line,474}]},{riak_kv_index_hashtree, >>> init,1,[{file,"src/riak_kv_index_hashtree.erl"},{line,268}]} >>> ,{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]} >>> ,{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,23 >>> 9}]}]},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line >>> ,328}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{ >>> line,239}]}]} >>> ancestors: [<0.715.0>,riak_core_vnode_sup,riak_core_sup,<0.160.0>] >>> messages: [] >>> links: [] >>> dictionary: [] >>> trap_exit: false >>> status: running >>> heap_size: 1598 >>> stack_size: 27 >>> reductions: 889 >>> neighbours: >>> >>> >>> Regards >>> Arun >>> >>> >> Hi Arun, >> >> The crash log you provided shows that there is a corrupted file in the >> AAE (anti_entropy) backend. Entries in console.log should have more >> information about which partition is affected. Please post output from the >> affected node at around 2017-02-13T21:18:30. As this is AAE data, it is >> safe to remove the directory named after the affected partition from the >> active_entropy directory before restarting the node. You may find that >> there is more than one affected partition, the next of which will be >> encountered after the attempted restart only. If this is the case, simply >> identify the next partition in the same way and remove it, too, until the >> node starts up successfully again. >> >> Is there a reason why the nodes aren't shut down in the regular way? >> >> Kind Regards, >> >> Magnus >> >> >> >> -- >> Magnus Kessler >> Client Services Engineer >> Basho Technologies Limited >> >> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431 >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> > <aaeLOG.tar> > > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com