---------- Původní zpráva ---------- > Od: Matthew Von-Maszewski > Datum: 22. 10. 2012 > Předmět: Re: Riak performance problems when LevelDB database grows beyond 16GB > Jan, > > ... > The next question from me is whether the drive / disk array problems are your > only problem at this point. The data in log_jan.txt looks ok until > the failures start. I am willing to work more, but I need to better > understand your next level of problems. > > Matthew
Hi Matthew, thanks again for helping me. It was actually bad RAM. It took me some time to convince the hosting provider since the problem did not show up in their hardware tests. :-/ I ran a 4 day test when the two bad nodes got fixed, and the original issue (that a Riak node got stuck) did not appear again. There are however other problems which seem to be caused by my "misuse" of LevelDB for storing short-lived data (the data is valid only for 24 hours). Here is the application throughput during a 4 days test with 5 Riak nodes: http://janevangelista.rajce.idnes.cz/nastenka#5Riak_4d_edited.jpg This graph shows memory use on a Riak node: http://janevangelista.rajce.idnes.cz/nastenka/#MemNode3-edited.jpg (Memory use on other nodes looks similar, but the OOM killed was not invoked there.) And this graph shows disk space consumption on a Riak node: http://janevangelista.rajce.idnes.cz/nastenka#DiskSpace-4d-edited.jpg The OOM condition which killed one Riak node (and slowed down the other ones) seems to be caused by the map-reduce jobs which periodically delete old data from the database. The entries are deleted with a mapred job querying the secondary index and using the reduce function published at http://contrib.basho.com/delete_keys.html . I wish LevelDB could expire old entries in the same was as BitCask does. :-) In an older 3 day test I had only 5 min timeout for the mapred jobs (a bug). It caused premature cancellation of the jobs deleting the old data - but the throughput was better: http://janevangelista.rajce.idnes.cz/nastenka#4Riak_3d_8K_edited.jpg The memory use looked reasonable as well: http://janevangelista.rajce.idnes.cz/nastenka/#Memory-3d-edited.jpg The disk use in this case was: http://janevangelista.rajce.idnes.cz/nastenka#DiskSpace.jpg The databases were cca 85 GB. So the only problem now seems to be how to get rid of the old data. Any hints? Thanks, Jan _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
