IIRC cs stores ~3x1mb entries for each 1mb of object that you store at
default n_val.   so, 300k * 3 * 9 = 810k, * 150b (to be conservative) = 1.1
GB.  There will be some other static and per-entry overheads, but you
should be able to do better than that before you OOM, even on a 1 node
system.

Might be best to start again with all of the default files before you
started tuning and retry, there may be a subtle error in your configs
that's causing an issue.


On Tue, Aug 20, 2013 at 4:31 PM, Idan Shinberg <[email protected]>wrote:

> Thank you all for you kind and quick answers
>
> However , even on a 3 node or 5 node cluster
> We're still seeing memory bloat ( only much notably slower , as load is
> distributed between more machines )
>
> it's important to stress , this is an "read-append" only cluster - This
> means the data never expires , and from the moment the cluster is
> up , we keep adding data in the form of S3 puts ( of around 9MB objects )
> , until we reach around 300K PUTS
>
> This is also why merges don't happen ( no stale data )
>
> Has anyone come across this situation in the past ?
> does Riak even fit for something like this  ?
>
>
> Regards,
>
> Idan Shinberg
>
>
> System Architect
>
> Idomoo Ltd.
>
>
>
> Mob +972.54.562.2072
>
> email [email protected]
>
> web www.idomoo.com
>
> [image: Description: cid:[email protected]]
>
>
> On Tue, Aug 20, 2013 at 11:32 AM, Erik Søe Sørensen <[email protected]>wrote:
>
>> Your max file size is (far!) less than your small file size threshold -
>> which means that at each merge,  *all* of the files will participate in the
>> merge. No wonder you need a lot of simultaneously open files... and long
>> merge times too, of course.
>> Try changing these parameters.
>>
>>
>>
>> -------- Oprindelig meddelelse --------
>> Fra: Idan Shinberg <[email protected]>
>> Dato:
>> Til: riak-users <[email protected]>
>> Cc: Arik Katsav <[email protected]>,Assaf Fogel <[email protected]>
>> Emne: Riak Memory Bloat issues with RiakCS/BitCask
>>
>>
>> Hi all
>>
>> We have a ~300GB Riak Single Node Cluster
>> This seems to have worked fine ( merging worked good ) until an
>> OpenFile/OpenPorts limit was reached ( since then , we've tweaked both to
>> 64K )
>> The above error caused a crash that left corrupted hint files .We've
>> deleted the hint ( and their corrosponding the data files ) to allow a
>> clean start to riak ( no errors upon start) .
>>
>> However , merges have not been really  working  ( taking forever to
>> complete  ) since then  , therefor causing :
>>
>>  *   Huge Bloat on disk ( Data is around 150K objects of roughly 8MB each
>> , but has already more then quadrupled in size the riak storage used (
>> around 1.2 TB )
>>  *   Huge Bloat in memory , which eventually kills riak itself ( OOM
>> killer )
>>
>> We're not doing anything complex , just using riak and riak-cs to emulate
>> S3 access ( and only it ) for roughly 15 client writes  per minute.
>>
>> Our merge settings ( uber-low , but have worked correctly in the up till
>> a few days ago ) :
>>
>>  {riak_kv, [
>>             %% Storage_backend specifies the Erlang module defining the
>> storage
>>             %% mechanism that will be used on this node.
>>                 {add_paths,
>> ["/usr/lib64/riak-cs/lib/riak_cs-1.3.1/ebin"]},
>>                 {storage_backend, riak_cs_kv_multi_backend},
>>                 {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
>>                 {multi_backend_default, be_default},
>>                 {multi_backend, [
>>                     {be_default, riak_kv_eleveldb_backend, [
>>                         {max_open_files, 50},
>>                         {data_root, "/var/lib/riak/leveldb"}
>>                     ]},
>>                     {be_blocks, riak_kv_bitcask_backend, [
>>
>>                         {max_file_size, 16#2000000}, %% 32MB
>>
>>                         %% Trigger a merge if any of the following are
>> true:
>>                         {frag_merge_trigger, 10}, %% fragmentation >= 10%
>>                         {dead_bytes_merge_trigger, 8388608}, %% dead
>> bytes > 8 MB
>>
>>                         %% Conditions that determine if a file will be
>> examined during a merge:
>>                         {frag_threshold, 5}, %% fragmentation >= 5%
>>                         {dead_bytes_threshold, 2097152}, %% dead bytes >
>> 2 MB
>>                         {small_file_threshold, 16#80000000}, %% file is <
>> 2GB
>>
>>                         {data_root, "/var/lib/riak/bitcask"},
>>                         {log_needs_merge, true}
>>
>>
>>                     ]}
>>                 ]},
>>
>> As you've noticed , log_needs_merge is set to true and we do get our logs
>> filled with needs_merge messages such as this one :
>>
>> ,{"/var/l...",...},...]
>> 2013-08-19 00:09:49.043 [info] <0.17972.0>
>> "/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728"
>> needs_merge:
>> [{"/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/1153.bitcask.data",[{small_file,20506434}]},{"/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/1152.bitcask.data",[{small_file,33393237}]},{"/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/1151.bitcask.data",[{small_file,33123254}]},{"/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/1150.bitcask.data",[{small_file,32505520}]},{"/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/1149.
>> ...
>> ...
>> ...
>>
>> Yet a merge Only a single merge happened  ( and only after around 20
>> minutes since we started putting pressure on the riak) :
>>
>> 2013-08-19 00:17:29.456 [info] <0.18964.14> Merged
>> {["/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/712.bitcask.data","/var/lib/riak/
>>
>> bitcask/388211372416021087647853783690262677096107081728/711.bitcask.data","/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/710.bitcask
>>
>> .data","/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/709.bitcask.data","/var/lib/riak/bitcask/38821137241602108764785378369026267709
>>
>> 6107081728/708.bitcask.data","/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/707.bitcask.data","/var/lib/riak/bitcask/388211
>> ...
>> ...
>> ...
>> var/lib/riak/bitc
>>
>> ask/388211372416021087647853783690262677096107081728/697.bitcask.data","/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/696.bitcask.dat
>>
>> a","/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/695.bitcask.data","/var/lib/riak/bitcask/388211372416021087647853783690262677096107
>>
>> 081728/694.bitcask.data","/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/693.bitcask.data","/var/lib/riak/bitcask/38821137241602108764
>>
>> 7853783690262677096107081728/692.bitcask.data","/var/lib/riak/bitcask/388211372416021087647853783690262677096107081728/691.bitcask.data","/var/lib/riak/bitcas
>> k/38821137241602108...",...],...} in 1325.611982 seconds.
>>
>> Is it reasonable for a merge to take more then 20 minutes ?
>> Especially assuming riak's memory usage is bloating much faster ?
>> Will Scaling the cluster from a single node to a 3-node cluster ease the
>> problem ?
>>
>> As for the server and usage specs
>>
>> - Virtual machine having around 8 virtual cores
>> - 12 GB of RAM
>> - 8 TB of Storage composed of 4 x 2TB disks in Raid 10 ( 4TB available
>> storage )
>> - ~150 keys several 10s of bytes long ( using Riak-CS for s3 storage ) .
>> - ~8MB value size for each key ( raw file )
>> - ~22000 Open files ( mostly hint files ) by riak
>> - Replication factor of  1
>> - Ring size is 64
>>
>> I'll provide the logs if needed , yet I doubt they'll prove useful .
>>
>> Any ideas/advice will be appreciated
>>
>>
>> Regards,
>>
>> Idan Shinberg
>>
>>
>> System Architect
>>
>> Idomoo Ltd.
>>
>>
>>
>> Mob +972.54.562.2072
>>
>> email [email protected]<mailto:[email protected]>
>>
>> web www.idomoo.com<http://www.idomoo.com/>
>>
>> [cid:[email protected]]
>>
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

<<image001.jpg>>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to