[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

Adam Kocoloski (JIRA) Thu, 05 Dec 2013 12:50:43 -0800

    [ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840549#comment-13840549
 ]


Adam Kocoloski commented on COUCHDB-1946:
-----------------------------------------

[~dch] was kind enough to allow me onto his VM this afternoon to poke around.  
Thanks Dave!  Here's what I found:

# The memory utilization is all taken up in refc binaries, not processes.
# The binaries are mostly attached to couch_stream processes.
# Adding a hibernate after each write by the stream process goes a _long_ way 
towards stabilizing memory usage.

For posterity here's how you sum up the size of all binaries attached to a 
process P:

{code}
BinMem = fun(P) -> case  process_info(P, binary) of {binary, Bins} ->  
lists:sum([Size || {_, Size, _} <- Bins]); _ -> 0 end end.
{code}

and here's the diff against 1.5.0 to cause couch_stream to hibernate after each 
write:

{code}
diff --git a/src/couchdb/couch_stream.erl b/src/couchdb/couch_stream.erl
index 959feef..4067ff7 100644
--- a/src/couchdb/couch_stream.erl
+++ b/src/couchdb/couch_stream.erl
@@ -255,7 +255,7 @@ handle_call({write, Bin}, _From, Stream) ->
                         buffer_len=0,
                         md5=Md5_2,
                         identity_md5=IdenMd5_2,
-                        identity_len=IdenLen + BinSize}};
+                        identity_len=IdenLen + BinSize}, hibernate};
     true ->
         {reply, ok, Stream#stream{
                         buffer_list=[Bin|Buffer],
{code}

Adding that patch _will_ cause an increase in CPU consumption when writing 
attachments.  There may well be more subtle changes (e.g. playing with the 
{{fullsweep_after}} option when starting the streamer) that could achieve 
stability with fewer CPU cycles.

I should also note that while memory usage is far more stable it is still 
sitting at 2.2 GB RES right now and seems to be gradually climbing over time, 
so don't go replicating to a t1.micro instance just yet.  I think we do have a 
relatively good understanding of what's going on at this point, though.

> Trying to replicate NPM grinds to a halt after 40GB
> ---------------------------------------------------
>
>                 Key: COUCHDB-1946
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1946
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Marc Trudel
>         Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

Reply via email to