Idan,

I'll investigate this a bit and see if I can replicate similar behavior and
hopefully I can get back to you with more information. Thanks for sharing
the info.

Kelly


On Wed, May 22, 2013 at 3:23 AM, Idan Shinberg <[email protected]>wrote:

> Hey Kelly
>
> Thanks for getting back to me ...
>
> You were right to bring up the point -  these setting were indeed
> applied gradually .
>
> I have thus started from scratch with the same settings mentioned above in
> place
>
> I made 3 batch of 48 uploads of the same 32 MB files to 48 different keys
> in s3
> I Wound up with 48 keys in the S3 ( uploads overwrote old data )  , each
> is 32 MB of size  , for a total of 144 uploads
>
> BTW , I also forgot to mention n_val is set to 1 in default_bucket_props .
> Bitcask dir was around 5.5 GB  and after merges kicked in it shrunk to 3.4
> GB
>
> still , actual data-set size  should be 48 x 32 MB , which is 1.5 GB .
> I also noticed each time I upload a file , 2x of it's size is
> automatically used , And I'm guessing that's related :-)
>
> The Single Riak node is running on CentOS 6.3 with 1.3.1 packaged
> version...
>
>
> Thanks
>
> Idan Shinberg
> idomoo
>
>
> On Wed, May 22, 2013 at 2:26 AM, Kelly McLaughlin <[email protected]> wrote:
>
>> Idan,
>>
>> Bitcask can sometimes be slow to reclaim space after deleting objects
>> from Riak CS. Are the settings you included the settings that have been in
>> place during all of your uploads and deletions? I am surprised that just a
>> few tens of uploads of 32 MB objects used up 15 GB of space. Can you be
>> more specific on a count of uploads? Also do you have any error output in
>> the riak or riak cs log files that may be related? Finally, which packages
>> are you using for your testing?
>>
>> Kelly
>>
>>
>> On Tue, May 21, 2013 at 2:18 PM, Idan Shinberg 
>> <[email protected]>wrote:
>>
>>> Thus , I fear Riak never treats their data as "dead-bytes" and they
>>> never get merged
>>>
>>> I created 2 buckets using s3cmd and made several tens of uploads  of
>>> 32mb sized files , deleting them right afterwards ( with proper s3cmd
>>> commands , of course) .
>>>
>>> I ended up with no buckets and no keys in my riak s3 database ,
>>> however , directory /var/lib/riak/bitcask/ 64 partitions now occupy 15GB
>>> worth of space
>>>
>>> several riak restarts did not trigger any merges , and my merge settings
>>> are set to impose very though merge triggering criterias , So I'm guessing
>>> the only reason the data is not being cleared is the fact that it's still
>>> in use ...
>>>
>>> Relevant riak-cs config :
>>>
>>> *              %% == Garbage Collection ==*
>>> *
>>> *
>>> *              %% The number of seconds to retain the block*
>>> *              %% for an object after it has been deleted.*
>>> *              %% This leeway time is set to give the delete*
>>> *              %% indication time to propogate to all replicas.*
>>> *              %% 86400 is 24-hours.*
>>> *              {leeway_seconds, 30},*
>>> *
>>> *
>>> *              %% How often the garbage collection daemon*
>>> *              %% waits in-between gc batches.*
>>> *              %% 900 is 15-minutes.*
>>> *              {gc_interval, 60},*
>>> *
>>> *
>>> *              %% How long a move to the garbage*
>>> *              %% collection to do list can remain*
>>> *              %% failed, before we retry it.*
>>> *              %% 21600 is 6-hours.*
>>> *              {gc_retry_interval,300},*
>>>
>>>
>>>
>>> Relevant Riak Config
>>>
>>> *{riak_kv, [*
>>> *            %% Storage_backend specifies the Erlang module defining
>>> the storage*
>>> *            %% mechanism that will be used on this node.*
>>> *                {add_paths,
>>> ["/usr/lib64/riak-cs/lib/riak_cs-1.3.1/ebin"]},*
>>> *                {storage_backend, riak_cs_kv_multi_backend},*
>>> *                {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},*
>>> *                {multi_backend_default, be_default},*
>>> *                {multi_backend, [*
>>> *                    {be_default, riak_kv_eleveldb_backend, [*
>>> *                        {max_open_files, 50},*
>>> *                        {data_root, "/var/lib/riak/leveldb"}*
>>> *                    ]},*
>>> *                    {be_blocks, riak_kv_bitcask_backend, [*
>>> *
>>> *
>>> *                        {max_file_size, 16#4000000}, %% 64MB*
>>> *
>>> *
>>> *                        %% Trigger a merge if any of the following are
>>> true:*
>>> *                        {frag_merge_trigger, 10}, %% fragmentation >=
>>> 10%*
>>> *                        {dead_bytes_merge_trigger, 33554432}, %% dead
>>> bytes > 32 MB*
>>> *
>>> *
>>> *                        %% Conditions that determine if a file will be
>>> examined during a merge:*
>>> *                        {frag_threshold, 5}, %% fragmentation >= 5%*
>>> *                        {dead_bytes_threshold, 8388608}, %% dead bytes
>>> > 8 MB*
>>> *                        {small_file_threshold, 16#80000000}, %% file
>>> is < 2GB*
>>> *
>>> *
>>> *                        {data_root, "/var/lib/riak/bitcask"}*
>>> *                    ]}*
>>> *                ]},*
>>>
>>> ...
>>> ...
>>> ...
>>>
>>> * {bitcask, [*
>>> *             %% Configure how Bitcask writes data to disk.*
>>> *             %%   erlang: Erlang's built-in file API*
>>> *             %%      nif: Direct calls to the POSIX C API*
>>> *             %%*
>>> *             %% The NIF mode provides higher throughput for certain*
>>> *             %% workloads, but has the potential to negatively impact*
>>> *             %% the Erlang VM, leading to higher worst-case latencies*
>>> *             %% and possible throughput collapse.*
>>> *             {io_mode, erlang},*
>>> *
>>> *
>>> *             {max_file_size, 16#4000000}, %% 64MB*
>>> *             {merge_window, always}, %% Span of hours during which
>>> merge is acceptable.*
>>> *
>>> *
>>> *             %% Trigger a merge if any of the following are true:*
>>> *             {frag_merge_trigger, 10}, %% fragmentation >= 10%*
>>> *             {dead_bytes_merge_trigger, 33554432}, %% dead bytes > 32
>>> MB*
>>> *
>>> *
>>> *             %% Conditions that determine if a file will be examined
>>> during a merge:*
>>> *             {frag_threshold, 5}, %% fragmentation >= 5%*
>>> *             {dead_bytes_threshold, 8388608}, %% dead bytes > 8 MB*
>>> *             {small_file_threshold, 16#80000000}, %% file is < 2GB*
>>> *
>>> *
>>> *             {data_root, "/var/lib/riak/bitcask"}*
>>> *
>>> *
>>> *           ]},*
>>>
>>> I do see merges taking place in riak's console.log , they're just not
>>> making that much of a difference ...
>>>
>>> Any idea what I might be missing here ?
>>>
>>> Thanks
>>>
>>> Idan Shinberg
>>> idomoo
>>>
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [email protected]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to