Re: Object not found after successful PUT on S3 API

Daniel Miller Tue, 02 May 2017 10:00:06 -0700

Matthew,

That's very helpful, thank you! I agree, the ring size change is probably
the most significant factor, and we probably should not have done that on
this cluster considering it is the smallest of our production clusters and
the growth rate is very low.


Daniel

On Tue, May 2, 2017 at 11:01 AM, Matthew Von-Maszewski <matth...@basho.com>
wrote:

> Daniel,
>
> 1G of ram is an interesting and untested challenge.  Others have succeeded
> in getting regular Riak KV to operate on a Raspberry PI with 2g of Ram.
>
> Here are the 3 things you can do in an attempt to drive down required RAM
> (in order of priority):
>
> 1.  reduce the ring size:  in riak.conf set “ring_size=16”.  It is
> currently 128.  You will have to rebuild your dataset from scratch.
>
> 2.  reduce the memory model used by leveldb from “normal” to “developer"
>
>         {multi_backend, [
>             {be_default, riak_kv_eleveldb_backend, [
>                 {data_root, "/opt/data/ecrypt/riak/leveldb”},
>                 {limited_developer_mem, true}
>             ]},
>             {be_blocks, riak_kv_eleveldb_backend, [
>                 {data_root, "/opt/data/ecrypt/riak/blocks”},
>                 {limited_developer_mem, true}
>             ]}
>         ]}
>
> 3.  disable the active anti-entropy feature:  in riak.conf set
> "anti_entropy = passive"
>
>
> The above 3 changes greatly reduce that runtime memory requirements.  I
> understand why your swappiness setting is helping.  It is pushing
> executable pages into swap in favor of data memory pages.  That is going to
> help in your situation.  Do not waste time going back to swappiness=0.
>
> Matthew
>
>
>
> On May 2, 2017, at 8:39 AM, Daniel Miller <dmil...@dimagi.com> wrote:
>
> Hi Mattew,
>
> I have attached the file generated by riak-debug as requested. Thanks for
> taking a look at this for me.
>
> The node I ran this on has had swappiness temporarily set back to 0 as it
> was when the load avg was spiking and the node was becoming unresponsive. I
> say "temporarily" because I had changed swappiness to 20 on all nodes over
> the weekend in an attempt to hopefully make the cluster more stable than it
> had been with swappiness=0. Incidentally, setting swappiness to 20 seemed
> to calm things down and the cluster has been stable with no issues over the
> weekend, which is great news, although a little confusing. I did notice
> that memory use is slowly increasing, so it's possible that once all of
> swap has been consumed the cluster will become unstable again.
>
> In case you haven't reviewed the history on this thread, this is not a
> standard Riak CS configuration. I'm using leveldb for both be_blocks and
> be_default as directed by Luke Bakken after he discovered that the previous
> thing I had tried (storage_backend=leveldb with no advanced.config, and
> therefore no multi_backend), could possibly result in silent loss of data
> due to manifests being randomly overwritten.
>
> The reason we're trying to use leveldb for both backends rather than
> bitcask is to hopefully run Riak in a more RAM constrained environment than
> is typical. As I understand it, bitcask keeps all keys in RAM while leveldb
> does not. The tradeoff of with using leveldb instead of bitcask is
> additional latency, but so far this has not been a problem for us.
>
> Daniel
>
>
> On Fri, Apr 28, 2017 at 10:15 AM, Matthew Von-Maszewski <
> matth...@basho.com> wrote:
>
>> Daniel,
>>
>> Something is wrong.  All instances of leveldb within a node share the
>> total memory configuration.  The memory is equally divided between all
>> active vnodes.  It is possible to create an OOM situation if total RAM is
>> low and vnodes count per node is high relative to RAM size.
>>
>> The best next step would be for you to execute the riak-debug program on
>> one of the nodes known to experience OOM.  Send the resulting .tar.gz file
>> directly to me (no need to share that with the mailing list).  I will
>> review the memory situation and suggest options.
>>
>> Matthew
>>
>> On Apr 28, 2017, at 8:22 AM, Daniel Miller <dmil...@dimagi.com> wrote:
>>
>> Hi Luke,
>>
>> I'm reviving this thread from March where we discussed a new backend
>> configuration for our riak cluster. We have had a chance to test out the
>> new recommended configuration, and so far we have not been successful in
>> limiting the RAM usage of leveldb with multi_backend. We have tried various
>> configurations to limit memory usage without success.
>>
>> First try (default config).
>> riak.conf: leveldb.maximum_memory.percent = 70
>>
>> Second try.
>> riak.conf: leveldb.maximum_memory.percent = 40
>>
>> Third try
>> riak.conf: #leveldb.maximum_memory.percent = 40 (commented out)
>> advanced.config: [{eleveldb, [{total_leveldb_mem_percent, 30}]}, ...
>>
>> In all cases (under load) riak consumes all available RAM and eventually
>> becomes unresponsive, presumably due to OOM conditions. Is there a way to
>> limit the amount of RAM consumed by riak with the new multi_backend
>> configuration? For example, do we need to consider ring size or other
>> configuration parameters when calculating the value of
>> total_leveldb_mem_percent?
>>
>> Notably, the old (storage_backend = leveldb in riak.conf, empty
>> advanced.config) clusters have had very good RAM and disk usage
>> characteristics. Is there any way we can make riak or riak cs avoid the
>> rare occasions where it overwrites the manifest file while using this
>> (non-multi) backend?
>>
>> Thank you,
>> Daniel Miller
>>
>>
>> On Tue, Mar 7, 2017 at 3:58 PM, Luke Bakken <lbak...@basho.com> wrote:
>>
>>> Hi Daniel,
>>>
>>> Thanks for providing all of that information.
>>>
>>> You are missing important configuration for riak_kv that can only be
>>> provided in an /etc/riak/advanced.config file. Please see the following
>>> document, especially the section to which I link here:
>>>
>>> http://docs.basho.com/riak/cs/2.1.1/cookbooks/configuration/
>>> riak-for-cs/#setting-up-the-proper-riak-backend
>>>
>>> [
>>>     {riak_kv, [
>>>         *% NOTE: double-check this path for your environment:*
>>>         {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.1.1/ebin"]},
>>>         {storage_backend, riak_cs_kv_multi_backend},
>>>         {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
>>>         {multi_backend_default, be_default},
>>>         {multi_backend, [
>>>             {be_default, riak_kv_eleveldb_backend, [
>>>                 {data_root, "/opt/data/ecryptfs/riak"}
>>>             ]},
>>>             {be_blocks, riak_kv_eleveldb_backend, [
>>>                 {data_root, "/opt/data/ecryptfs/riak_blocks"}
>>>             ]}
>>>         ]}
>>>     ]}
>>> ].
>>>
>>> Your configuration will look like the above. The contents of this file
>>> are merged with the contents of /etc/riak/riak.conf to produce the
>>> configuration that Riak uses.
>>>
>>> Notice that I chose riak_kv_eleveldb_backend twice because of the
>>> discussion you had previously about RAM usage and bitcask (
>>> http://lists.basho.com/pipermail/riak-users_lists.basho.com
>>> /2016-November/018801.html)
>>>
>>> In your current configuration, you are not using the expected prefix for
>>> the block data. My guess is that on very rare occasions your data happens
>>> to overwrite the manifest for a file. You may also have corrupted files at
>>> this point without noticing it at all.
>>>
>>> *IMPORTANT:* you can't switch from your current configuration to this
>>> new one without re-saving all of your data.
>>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
> <r...@hqriak17.internal-in.commcarehq.org-riak-debug.tar.gz>
>
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Object not found after successful PUT on S3 API

Reply via email to