---------- Původní zpráva ----------
> Od: Matthew Von-Maszewski 
> Datum: 22. 10. 2012
> Předmět: Re: Riak performance problems when LevelDB database grows beyond 16GB
> Jan,
> 
> ...
> The next question from me is whether the drive / disk array problems are your 
> only problem at this point.  The data in log_jan.txt looks ok until 
> the failures start.  I am willing to work more, but I need to better 
> understand your next level of problems.
>
> Matthew

Hi Matthew,

thanks again for helping me. It was actually bad RAM. It took me some time to 
convince the hosting provider since the problem did not show up in their 
hardware tests. :-/

I ran a 4 day test when the two bad nodes got fixed, and the original issue 
(that a Riak node got stuck) did not appear again.

There are however other problems which seem to be caused by my "misuse" of 
LevelDB for storing short-lived data (the data is valid only for 24 hours).

Here is the application throughput during a 4 days test with 5 Riak nodes:

http://janevangelista.rajce.idnes.cz/nastenka#5Riak_4d_edited.jpg

This graph shows memory use on a Riak node:

http://janevangelista.rajce.idnes.cz/nastenka/#MemNode3-edited.jpg

(Memory use on other nodes looks similar, but the OOM killed was not invoked 
there.)

And this graph shows disk space consumption on a Riak node:

http://janevangelista.rajce.idnes.cz/nastenka#DiskSpace-4d-edited.jpg

The OOM condition which killed one Riak node (and slowed down the other ones) 
seems to be caused by the map-reduce jobs which 
periodically delete old data from the database. The entries are deleted with a 
mapred job querying the secondary index and using the reduce function published 
at http://contrib.basho.com/delete_keys.html .

I wish LevelDB could expire old entries in the same was as BitCask does. :-)

In an older 3 day test I had only 5 min timeout for the mapred jobs (a bug). It 
caused premature cancellation of the jobs deleting the old data - but the 
throughput was better:

http://janevangelista.rajce.idnes.cz/nastenka#4Riak_3d_8K_edited.jpg

The memory use looked reasonable as well:

http://janevangelista.rajce.idnes.cz/nastenka/#Memory-3d-edited.jpg

The disk use in this case was:

http://janevangelista.rajce.idnes.cz/nastenka#DiskSpace.jpg

The databases were cca 85 GB.

So the only problem now seems to be how to get rid of the old data. Any hints?

Thanks, Jan

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to