Hey Kelly Thanks for getting back to me ...
You were right to bring up the point - these setting were indeed applied gradually . I have thus started from scratch with the same settings mentioned above in place I made 3 batch of 48 uploads of the same 32 MB files to 48 different keys in s3 I Wound up with 48 keys in the S3 ( uploads overwrote old data ) , each is 32 MB of size , for a total of 144 uploads BTW , I also forgot to mention n_val is set to 1 in default_bucket_props . Bitcask dir was around 5.5 GB and after merges kicked in it shrunk to 3.4 GB still , actual data-set size should be 48 x 32 MB , which is 1.5 GB . I also noticed each time I upload a file , 2x of it's size is automatically used , And I'm guessing that's related :-) The Single Riak node is running on CentOS 6.3 with 1.3.1 packaged version... Thanks Idan Shinberg idomoo On Wed, May 22, 2013 at 2:26 AM, Kelly McLaughlin <[email protected]> wrote: > Idan, > > Bitcask can sometimes be slow to reclaim space after deleting objects from > Riak CS. Are the settings you included the settings that have been in place > during all of your uploads and deletions? I am surprised that just a few > tens of uploads of 32 MB objects used up 15 GB of space. Can you be more > specific on a count of uploads? Also do you have any error output in the > riak or riak cs log files that may be related? Finally, which packages are > you using for your testing? > > Kelly > > > On Tue, May 21, 2013 at 2:18 PM, Idan Shinberg > <[email protected]>wrote: > >> Thus , I fear Riak never treats their data as "dead-bytes" and they never >> get merged >> >> I created 2 buckets using s3cmd and made several tens of uploads of 32mb >> sized files , deleting them right afterwards ( with proper s3cmd commands , >> of course) . >> >> I ended up with no buckets and no keys in my riak s3 database , >> however , directory /var/lib/riak/bitcask/ 64 partitions now occupy 15GB >> worth of space >> >> several riak restarts did not trigger any merges , and my merge settings >> are set to impose very though merge triggering criterias , So I'm guessing >> the only reason the data is not being cleared is the fact that it's still >> in use ... >> >> Relevant riak-cs config : >> >> * %% == Garbage Collection ==* >> * >> * >> * %% The number of seconds to retain the block* >> * %% for an object after it has been deleted.* >> * %% This leeway time is set to give the delete* >> * %% indication time to propogate to all replicas.* >> * %% 86400 is 24-hours.* >> * {leeway_seconds, 30},* >> * >> * >> * %% How often the garbage collection daemon* >> * %% waits in-between gc batches.* >> * %% 900 is 15-minutes.* >> * {gc_interval, 60},* >> * >> * >> * %% How long a move to the garbage* >> * %% collection to do list can remain* >> * %% failed, before we retry it.* >> * %% 21600 is 6-hours.* >> * {gc_retry_interval,300},* >> >> >> >> Relevant Riak Config >> >> *{riak_kv, [* >> * %% Storage_backend specifies the Erlang module defining the >> storage* >> * %% mechanism that will be used on this node.* >> * {add_paths, >> ["/usr/lib64/riak-cs/lib/riak_cs-1.3.1/ebin"]},* >> * {storage_backend, riak_cs_kv_multi_backend},* >> * {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},* >> * {multi_backend_default, be_default},* >> * {multi_backend, [* >> * {be_default, riak_kv_eleveldb_backend, [* >> * {max_open_files, 50},* >> * {data_root, "/var/lib/riak/leveldb"}* >> * ]},* >> * {be_blocks, riak_kv_bitcask_backend, [* >> * >> * >> * {max_file_size, 16#4000000}, %% 64MB* >> * >> * >> * %% Trigger a merge if any of the following are >> true:* >> * {frag_merge_trigger, 10}, %% fragmentation >= >> 10%* >> * {dead_bytes_merge_trigger, 33554432}, %% dead >> bytes > 32 MB* >> * >> * >> * %% Conditions that determine if a file will be >> examined during a merge:* >> * {frag_threshold, 5}, %% fragmentation >= 5%* >> * {dead_bytes_threshold, 8388608}, %% dead bytes >> > 8 MB* >> * {small_file_threshold, 16#80000000}, %% file is >> < 2GB* >> * >> * >> * {data_root, "/var/lib/riak/bitcask"}* >> * ]}* >> * ]},* >> >> ... >> ... >> ... >> >> * {bitcask, [* >> * %% Configure how Bitcask writes data to disk.* >> * %% erlang: Erlang's built-in file API* >> * %% nif: Direct calls to the POSIX C API* >> * %%* >> * %% The NIF mode provides higher throughput for certain* >> * %% workloads, but has the potential to negatively impact* >> * %% the Erlang VM, leading to higher worst-case latencies* >> * %% and possible throughput collapse.* >> * {io_mode, erlang},* >> * >> * >> * {max_file_size, 16#4000000}, %% 64MB* >> * {merge_window, always}, %% Span of hours during which >> merge is acceptable.* >> * >> * >> * %% Trigger a merge if any of the following are true:* >> * {frag_merge_trigger, 10}, %% fragmentation >= 10%* >> * {dead_bytes_merge_trigger, 33554432}, %% dead bytes > 32 MB >> * >> * >> * >> * %% Conditions that determine if a file will be examined >> during a merge:* >> * {frag_threshold, 5}, %% fragmentation >= 5%* >> * {dead_bytes_threshold, 8388608}, %% dead bytes > 8 MB* >> * {small_file_threshold, 16#80000000}, %% file is < 2GB* >> * >> * >> * {data_root, "/var/lib/riak/bitcask"}* >> * >> * >> * ]},* >> >> I do see merges taking place in riak's console.log , they're just not >> making that much of a difference ... >> >> Any idea what I might be missing here ? >> >> Thanks >> >> Idan Shinberg >> idomoo >> >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
