Good job on finding and fixing so fast.
I have to ask. What's with the naming scheme? Why not 2.9.2 instead of
2.9.0p2?
Cheers
Russell
On 28/06/2019 10:24, Martin Sumner wrote:
Bryan,
We saw that Riak was using much more memory than was expected at the
end of the handoffs. Using `riak-admin top` we could see that this
wasn't process memory, but binaries. Firstly did some work via attach
looping over processes and running GC to confirm that this wasn't a
failure to collect garbage - the references to memory were real. Then
did a bit of work in attach writing some functions to analyse
process_info/2 for each process (looking at binary and memory), and
discovered that there were penciller processes that had lots of
references to lots of large binaries (and this accounted for all the
unexpected memory use), and where the penciller was the only process
with a reference to the binary. This made no sense initially as the
penciller should only have small binaries (metadata). Then looked at
the running state of the penciller processes and could see no large
binaries in the state, but could see that a lot of the active keys in
the penciller were keys that were known to have large object values
(but small amounts of metadata) - and that the size of the object
values were the same as the size of the binary references found on the
penciller process via process_info/2..
I then recalled the first part of this:
https://dieswaytoofast.blogspot.com/2012/12/erlang-binaries-and-garbage-collection.html.
It was obvious that in extracting the metadata the beam was naturally
retaining a reference to the whole binary, as long as the sub-binary
was retained by the a process (the Penciller). Forcing a binary copy
resolved this referencing issue. It was nice that the same tools used
to detect the issue, made it quite easy to write a test to confirm
resolution -
https://github.com/martinsumner/leveled/blob/master/test/end_to_end/riak_SUITE.erl#L1214-L1239.
The memory leak section of Fred Herbert's
http://www.erlang-in-anger.com/ is great reading for helping with
these types of issues.
Thanks
Martin
On Fri, 28 Jun 2019 at 09:46, b h <bryanhuntwit...@gmail.com
<mailto:bryanhuntwit...@gmail.com>> wrote:
Nice work - I've read issue / PR - how did you discover / track it
down - tools or just reading the code ?
On Fri, 28 Jun 2019 at 09:35, Martin Sumner
<martin.sum...@adaptip.co.uk <mailto:martin.sum...@adaptip.co.uk>>
wrote:
There is now a second update available for 2.9.0:
https://github.com/basho/riak/tree/riak-2.9.0p2.
This patch, like the patch before, resolves a memory
management issue in leveled, which this time could be
triggered by sending many large objects in a short period of
time. The underlying problem is described a bit further here
https://github.com/martinsumner/leveled/issues/285, and is
resolved by leveled working more sympathetically with the beam
binary memory management.
Switching to the patched version is not urgent unless you are
using the leveled backend, and may send a large number of
large objects in a burst.
Updated packages are available (thanks to Nick Adams at TI
Tokyo) - https://files.tiot.jp/riak/kv/2.9/2.9.0p2/
Thanks again to the testing team at the NHS Spine project,
Aaron Gibbon (BJSS) and Ramen Sen, who discovered the
problem. The issue was discovered in a handoff scenario where
there were a tens of thousands of 2MB objects stored in a
portion of the keyspace at the end of the handoff - which led
to memory issues until either more PUTs were received (to
force a persist to disk) or a restart occurred..
Regards
On Sat, 25 May 2019 at 09:35, Martin Sumner
<martin.sum...@adaptip.co.uk
<mailto:martin.sum...@adaptip.co.uk>> wrote:
Unfortunately, Riak 2.9.0 was released with an issue
whereby a race condition in heavy-PUT scenarios (e.g.
handoffs), could cause a leak of file descriptors.
The issue is described here -
https://github.com/basho/riak_kv/issues/1699, and the
underlying issue here -
https://github.com/martinsumner/leveled/issues/278.
There is a new patched version of the release available
(2.9.0p1) at
https://github.com/basho/riak/tree/riak-2.9.0p1. This
should be used in preference to the original release of 2.9.0.
Updated packages are available (thanks to Nick Adams at TI
Tokyo) - https://files.tiot.jp/riak/kv/2.9/2.9.0p1/
Thanks also to the testing team at the NHS Spine project,
Aaron Gibbon (BJSS) and Ramen Sen, who discovered the problem.
Regards
Martin
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com