[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2014-03-06 Thread Adam Cooper (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922481#comment-13922481
 ] 

Adam Cooper commented on COUCHDB-1946:
--

After hours of frustration trying to setup an npm registry with CouchDB 1.5.0, 
I'm glad to have found this thread to know it isn't just me. 

The only way I was able to replicate https://fullfatdb.npmjs.com/registry was 
by installing CouchDB 1.2.0. Is this the current recommendation until this bug 
is resolved?

Thanks!

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
>Priority: Critical
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2014-01-15 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872605#comment-13872605
 ] 

Adam Kocoloski commented on COUCHDB-1946:
-

I guess that's COUCH-448.  Bit of an oversight related to the complexity of 
doing multipart uploads.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2014-01-15 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872600#comment-13872600
 ] 

Robert Newson commented on COUCHDB-1946:


We should enhance the replicator to ship attachments with compression if it 
doesn't already do so.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2014-01-15 Thread Nathan Caza (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872345#comment-13872345
 ] 

Nathan Caza commented on COUCHDB-1946:
--

I should probably mention, this is on 1.5.0

Upon further investigation, I found there are far larger documents, for 
instance:
cordova@258-f88ea93535011521e52219071ebba13b

with attachments the request is 2336256011 bytes, or 2.18GiB.
Depending on how these get pushed in, I imagine a few packages like that (the 
next largest is ~1.6GiB, 1.3GiB, etc..) being replicated in parallel would 
explain the large memory usage, and may be the cause of this issue. And could 
also explain why 12G of ram seemed to handle it.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2014-01-14 Thread Nathan Caza (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871736#comment-13871736
 ] 

Nathan Caza commented on COUCHDB-1946:
--

So, I've been experiencing this, I've had a script replicating 1-by-1 (no bulk 
api). I haven't had any issues, EXCEPT as replication goes on, small documents 
wiz by and now towards the end, all that is left is a large number of bigger 
documents.

I skipped large documents the first pass, but I presume had I not, I would have 
ran out of sockets/workers as they slowly filled up with the large 
transfers/documents.

Also to note that one of these, as an example, is about ~40MB (a unicode 
document/package I believe) , when simply gzipped becomes ~1.3MB.

Are there plans to support gzip compression? 


> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-26 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857222#comment-13857222
 ] 

Adam Kocoloski commented on COUCHDB-1946:
-

Hi [~indexzero], there are absolutely material differences for local vs. remote 
replications -- that's one axis I've been meaning to explore as soon as I have 
a chance to re-engage on this ticket.  For example, I've not seen this happen 
on a replication to Cloudant (which has no real notion of a "local" 
replication).

The other thing I'd love to see happen is a narrowing down of the list of 
affected versions; 1.2.0..1.5.0 is an awfully large changeset to bisect.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-26 Thread Charlie Robbins (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857213#comment-13857213
 ] 

Charlie Robbins commented on COUCHDB-1946:
--

[~janl] Some more data about this:

* Replication got stuck at update_seq 861259 from the public npm registry
* Replacing the registry with a new .couch file via rsync solved the 
replication problem
* We had another local replication of the registry running, and against the new 
.couch file local replication ran just fine past 861259

Any material differences / optimizations for local vs. remote replications?

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-26 Thread Charlie Robbins (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857153#comment-13857153
 ] 

Charlie Robbins commented on COUCHDB-1946:
--

[~janl] Hitting the individual master CouchDBs directly. Not going through the 
load balancer.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-26 Thread Jan Lehnardt (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857118#comment-13857118
 ] 

Jan Lehnardt commented on COUCHDB-1946:
---

[~indexzero] Are you replicating form a single CouchDB or through a load 
balancer with multiple couches behind it?

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-26 Thread Charlie Robbins (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857105#comment-13857105
 ] 

Charlie Robbins commented on COUCHDB-1946:
--

Was able to reproduce this in a similar, but distinctly different use-case. The 
replication of the registry for one of our unused private replicas was not 
restarted for a couple of weeks. This led to the a disk_size difference between 
the replica and the public npm registry of >40GB. 

After restarting the replication we are now seeing this same behavior. Since 
these registries were up to date when this issue was opened they were already 
past the document that Dave mentioned. Looks to be a particular class of 
document(s) that causes this, not just a single document.

Dave: is there a way to figure out which document(s) could be causing this?

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-10 Thread Nick North (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844069#comment-13844069
 ] 

Nick North commented on COUCHDB-1946:
-

I agree with Emilien: even if it doesn't actually break on 1.5, there's 
something wrong if it needs 12GB of memory now when it was fine with 1GB 
before, so it should be regarded as a bug. The NPM registry is one of the most 
prominent uses of CouchDb and, if it doesn't work well there, it dents public 
confidence.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-09 Thread Emilien Kenler (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844045#comment-13844045
 ] 

Emilien Kenler commented on COUCHDB-1946:
-

Hi,

As I said, I was able to replicate the npm registry with only 1GB of memory on 
CouchDB 1.2.0.
Even if the replication works on CouchDB 1.5.0, it should work with a similar 
amount of memory.
I don't think that it's an Erlang related issue, because I use the same version 
of Erlang for my tests.
I use the default configuration of CouchDB, so I think it's strange to have to 
change something to make the replication work on 1.5.0 if it works without any 
change on 1.2.0.

I can provide access to my two test VM, and I can provide new VM if needed.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-09 Thread Dave Cottlehuber (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843588#comment-13843588
 ] 

Dave Cottlehuber commented on COUCHDB-1946:
---

I'm pretty much finished a full replication of the registry with 1.5.0, 
multiple pipelines etc, filling my adsl link as much as possible, and I've not 
had any issues. Maximum RAM went up to ~ 12GiB (I have plenty to spare) but no 
hitches, need to copy individual docs across, nor restart replication, despite 
many transient network issues & apparent npm disconnects.

What's people's thoughts on this?

- is this a bug or simply we propose some tuning enhancements?
- we could clearly reduce memory consumption though as [~kocolosk] pointed out 
this will likely come with a cpu hit


> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-09 Thread Thor Anker Lange (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843191#comment-13843191
 ] 

Thor Anker Lange commented on COUCHDB-1946:
---

I am doing an replication with CouchDB 1.2.2-R15B03 and this seems to have 
gotten past the document causing problems - currently at seq 772595.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-09 Thread Thor Anker Lange (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843029#comment-13843029
 ] 

Thor Anker Lange commented on COUCHDB-1946:
---

I have been using the default replicator settings before I started trying to 
adjust the batch size - which did not help. Here is the last setup I tried:

{code}
{
 "_id": "npmjs_repl",
 "source":"http://isaacs.iriscouch.com/registry/";,
 "target":"registry",
 "continuous":false,
 "worker_batch_size":50,
 "user_ctx": {"name":"xpaadmin", "roles":["_admin"]}
}
{code}

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-09 Thread Dave Cottlehuber (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843018#comment-13843018
 ] 

Dave Cottlehuber commented on COUCHDB-1946:
---

[~stelcheck] agreed
[~thor.lange]

There's something with replicating this specific doc that seems to trigger 
issues. Here's what I used to identify it (call source db and use since= 
http://isaacs.iriscouch.com/registry/_changes\?limit\=2\&since\=701251

here's some things you can try:

# option 1

-  delete all existing replications
- compact your DB if there's a big difference between data size and on-disk 
size. jq is awesome for this.

curl -s http://localhost:5984/registry | jq ' (.disk_size| tonumber) - 
(.data_size |tonumber)'

http://stedolan.github.io/jq/

This is a good spot to copy the registry.couch file if you have space, in case 
you need to revert back to it.

-  replicate the single failing document by POSTing this to _replicator. This 
could take a *while*.

{{code}}
{
   "source": "http://isaacs.iriscouch.com/registry";,
   "target": "registry",
   "doc_ids": [
   "as-stream"
   ],
   "owner": "admin",
   }
}
{{code}}

- this is simply replicating the single stuck document. If you do this, I would 
love an ngrep or tcpdump of the traffic to see what happens on the wire during 
these stuck transfers

- once this is completed, you can then run the normal replication again.

# option 2

Install an older release of CouchDB and see if it doesn't get stuck here:

https://archive.apache.org/dist/couchdb/binary/win/1.2.2/

If you *can* please try the R15B03-1 release first, report back, and then the 
R14B04 one. It's not yet clear to me if the issue we are seeing is also related 
to garbage collection differences in Erlang/OTP between releases, or solely 
within CouchDB.

# option 3

Sometime later (hopefully today), I should have a bitttorrent accessible 
version of npm. I need to update & compact first, this is pretty much IO 
limited :-).


> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-09 Thread Marc Trudel (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843016#comment-13843016
 ] 

Marc Trudel commented on COUCHDB-1946:
--

I tried all sorts of settings, but always ended up with the same outcome.
Just took more or less time.






-- 
Marc Trudel-Belisle
Chief Technology Officer | Wizcorp Inc. 
--
TECH . GAMING . OPEN-SOURCE WIZARDS+ 81
3-4550-1448|Website


> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-09 Thread Alexander Shorin (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843011#comment-13843011
 ] 

Alexander Shorin commented on COUCHDB-1946:
---

Hey, [~stelcheck], [~thor.lange], [~MiLk]

Are you using default replicator settings?  Just to be sure that we are all on 
the same configuration since following [~dch] suggestion from first post helps 
a lot with reducing system resources usage during replication.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: [jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-09 Thread Nick North
When I tried replicating the NPM registry on Windows it bluescreened the
machine when the Erlang process reached about 1GB. On the plus side, once
the machine was rebooted, replication continued past 40GB, though it did
begin to slow down at that point. I don't really want the registry, so
stopped replicating at that point, but I got the feeling that you could get
the full registry onto a Windows CouchDb installation if you stop and
restart the CouchDb service when memory consumption gets too high.

Nick


On 9 December 2013 09:10, Thor Anker Lange (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842996#comment-13842996]
>
> Thor Anker Lange commented on COUCHDB-1946:
> ---
>
> I am seeing the exact same behaviour when reaching 30-40 GB running
> CouchDB on Windows (yes, unfortunally I do not have a choice of server OS).
> The source start sequence is around 701252.
>
> Marc, how did you manage to get a complete replica of the npm registry?
>
> > Trying to replicate NPM grinds to a halt after 40GB
> > ---
> >
> > Key: COUCHDB-1946
> > URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> > Project: CouchDB
> >  Issue Type: Bug
> >  Components: Database Core
> >Reporter: Marc Trudel
> > Attachments: couch.log
> >
> >
> > I have been able to replicate the Node.js NPM database until 40G or so,
> then I get this:
> > https://gist.github.com/stelcheck/7723362
> > I one case I have gotten a flat-out OOM error, but I didn't take a dump
> of the log output at the time.
> > CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also
> tried to restart replication from scratch - twice - bot cases stalling at
> 40GB.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1.4#6159)
>


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-09 Thread Marc Trudel (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842999#comment-13842999
 ] 

Marc Trudel commented on COUCHDB-1946:
--

I downloaded the .couch  file from someone :)

I'd really suggest to people running NPM to make nightlies of this file,
and make it available for download. Regardless of this bug, it would be
much faster than starting with replication (and pretty similar to standard
procedures for setting replicates in other DBMS, like MySQL)






-- 
Marc Trudel-Belisle
Chief Technology Officer | Wizcorp Inc. 
--
TECH . GAMING . OPEN-SOURCE WIZARDS+ 81
3-4550-1448|Website


> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-09 Thread Thor Anker Lange (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842996#comment-13842996
 ] 

Thor Anker Lange commented on COUCHDB-1946:
---

I am seeing the exact same behaviour when reaching 30-40 GB running CouchDB on 
Windows (yes, unfortunally I do not have a choice of server OS). The source 
start sequence is around 701252.

Marc, how did you manage to get a complete replica of the npm registry?

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-05 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840949#comment-13840949
 ] 

Adam Kocoloski commented on COUCHDB-1946:
-

I played with setting the {{\{fullsweep_after, 0\}}} option on {{couch_file}} 
and {{couch_stream}}.  It didn't have much of an effect on {{couch_stream}} 
(not surprising since I was already hibernating the server after each write, 
but it reduced the memory consumption of the {{couch_file}} down to ~nothing.

The reason for the {{couch_stream}} refc binary memory consumption continues to 
elude me.  I think the next step may to head up the stack towards the 
replicator processes.  Still, progress.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-05 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840549#comment-13840549
 ] 

Adam Kocoloski commented on COUCHDB-1946:
-

[~dch] was kind enough to allow me onto his VM this afternoon to poke around.  
Thanks Dave!  Here's what I found:

# The memory utilization is all taken up in refc binaries, not processes.
# The binaries are mostly attached to couch_stream processes.
# Adding a hibernate after each write by the stream process goes a _long_ way 
towards stabilizing memory usage.

For posterity here's how you sum up the size of all binaries attached to a 
process P:

{code}
BinMem = fun(P) -> case  process_info(P, binary) of {binary, Bins} ->  
lists:sum([Size || {_, Size, _} <- Bins]); _ -> 0 end end.
{code}

and here's the diff against 1.5.0 to cause couch_stream to hibernate after each 
write:

{code}
diff --git a/src/couchdb/couch_stream.erl b/src/couchdb/couch_stream.erl
index 959feef..4067ff7 100644
--- a/src/couchdb/couch_stream.erl
+++ b/src/couchdb/couch_stream.erl
@@ -255,7 +255,7 @@ handle_call({write, Bin}, _From, Stream) ->
 buffer_len=0,
 md5=Md5_2,
 identity_md5=IdenMd5_2,
-identity_len=IdenLen + BinSize}};
+identity_len=IdenLen + BinSize}, hibernate};
 true ->
 {reply, ok, Stream#stream{
 buffer_list=[Bin|Buffer],
{code}

Adding that patch _will_ cause an increase in CPU consumption when writing 
attachments.  There may well be more subtle changes (e.g. playing with the 
{{fullsweep_after}} option when starting the streamer) that could achieve 
stability with fewer CPU cycles.

I should also note that while memory usage is far more stable it is still 
sitting at 2.2 GB RES right now and seems to be gradually climbing over time, 
so don't go replicating to a t1.micro instance just yet.  I think we do have a 
relatively good understanding of what's going on at this point, though.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-05 Thread Dave Cottlehuber (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839966#comment-13839966
 ] 

Dave Cottlehuber commented on COUCHDB-1946:
---

Hey Adam, thanks. Yes it does, repeatedly. Takes around mmm 10-30 minutes to 
bloat. Any other pro debugging tips appreciated! I've not got time to work on 
this much today, more news on Friday.

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-04 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839738#comment-13839738
 ] 

Adam Kocoloski commented on COUCHDB-1946:
-

Hmm, this is certainly strange.  Does restarting the replication after the 
crash cause it to crash again?

If you happen to have an Erlang shell on the node when it's starts leaking 
memory I'd be curious to see the output of something like

lists:reverse(lists:sort([{process_info(P, [memory,initial_call,dictionary]), 
P} || P <- processes()])).

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-04 Thread Marc Trudel (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839699#comment-13839699
 ] 

Marc Trudel commented on COUCHDB-1946:
--

I have a full replicate running now, thanks to Emilien. Please PM me if you
want access to it to test replication.






-- 
Marc Trudel-Belisle
Chief Technology Officer | Wizcorp Inc. 
--
TECH . GAMING . OPEN-SOURCE WIZARDS+ 81
3-4550-1448|Website


> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-04 Thread Dave Cottlehuber (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839477#comment-13839477
 ] 

Dave Cottlehuber commented on COUCHDB-1946:
---

hey folks,

Something does happen around 30-40GB, I have replicated the situation here.

In both your issues, the final error is termination of couchjs process (used 
for map/reduce and any
javascripty stuff) by OOM -- you can see things like this, however that's not 
the root cause.

{code}
[Tue, 03 Dec 2013 01:47:46 GMT] [error] [<0.368.0>] OS Process died with 
status: 137
[Tue, 03 Dec 2013 01:47:47 GMT] [error] [<0.368.0>] ** Generic server <0.368.0> 
terminating 
** Last message in was {#Port<0.2771>,{exit_status,137}}
** When Server state == {os_proc,"/usr/bin/couchjs 
/usr/share/couchdb/server/main.js",
 #Port<0.2771>,
 #Fun,
 #Fun,5000}
{code}

This is the issue you're hitting, and you need to at least ensure ulimits etc
are set appropriately for the couchdb user:

http://wiki.apache.org/couchdb/Performance

You can optionally force OOMkiller off for the beam.smp process directly;

{code}
# assuming you run only one erlang VM at a time
echo '-1000' > /proc/`pgrep beam.smp`/oom_score_adj
{code}

It can be helpful to increase os process timeout, although that's not
what you're hitting here from what I see.

{code}
 curl -XPUT 
https://admin:passwd@localhost:5984/_config/couchdb/os_process_timeout -d 
'"6"'
{code}

However, there *is* some other issue here, before the OOM killer goes awol, 
CouchDB
starts consuming a lot of memory. My instance cruises along < 300MB RES RAM for
most of the replication, and then starts shooting up very rapidly past 3GB and 
then its
all over red rover. I'm working on tracking down exactly what this is but it's 
somewhat
tricky given the amount of concurrent stuff happening.

{code}
  Node: npm@z1 (Connected) (R15B01/5.9.1) unix (linux 3.2.0) CPU:4 SMP +A:16 +K
Time: local time 15:04:41, up for 000:00:33:51, 0ms latency,
Processes: total 504 (RQ 42) at 53424637 RpI using 63038.5k (63084.9k allocated)
Memory: Sys 2490.3m, Atom 264.9k/266.3k, Bin 2480.2m, Code 6292.2k, Ets 885.3k

Interval 1000ms, Sorting on "HTot" (Descending), Retrieved in 15ms
 Pid Registered Name  Reductions   MQueue HSize  SSize  HTot
  <0.4203.0> -459031   0  514229 9  832040
  <0.5277.0> -310990  121393 19 439204
 <0.6.0> error_logger 17874687 0  46368  8  364179
  <0.5840.0> -367940   1  28657  189225075
  <0.6295.0> -


Node: npm@z1 (Connected) (R15B01/5.9.1) unix (linux 3.2.0) CPU:4 SMP +A:16 +K
Time: local time 15:04:22, up for 000:00:33:32, 0ms latency,
Processes: total 504 (RQ 46) at 53654512 RpI using 62889.0k (62899.1k allocated)
Memory: Sys 2075.6m, Atom 264.9k/266.3k, Bin 2065.5m, Code 6292.2k, Ets 880.0k

Interval 1000ms, Sorting on "SSize" (Descending), Retrieved in 17ms
 Pid Registered Name  Reductions   MQueue HSize  SSize  HTot
  <0.6295.0> -351722   0  28657  204225075
  <0.5840.0> -340874   0  28657  158225075
  <0.5063.0> -302674   1  17711  118139104
  <0.4434.0> -693140  10946  86 85971
  <0.5062.0> -
  
  
Node: npm@z1 (Connected) (R15B01/5.9.1) unix (linux 3.2.0) CPU:4 SMP +A:16 +K
Time: local time 15:06:02, up for 000:00:35:12, 0ms latency,
Processes: total 496 (RQ 42) at 54048301 RpI using 58790.1k (58810.5k allocated)
Memory: Sys 2904.0m, Atom 264.9k/266.3k, Bin 2893.9m, Code 6292.2k, Ets 908.1k

Interval 1000ms, Sorting on "HSize" (Descending), Retrieved in 18ms
 Pid Registered Name  Reductions   MQueue HSize  SSize  HTot
  <0.4203.0> -459031   0  514229 9  832040
  <0.5002.0> -315640  196418 19 196418
  <0.5663.0> -362480  196418 19 196418
<0.12.0> rex  1330480  0  121393 9  121770
  <0.3919.0> couch_stats_a
  {code}

The last 15 minutes before the crash, the message queue for one process appears 
stuck
at 10:  `MQueue 10 <0.4203.0>` which could be normal or not, but I do think its 
weird.

atm npm itself is not letting me replicate so I can't debug further until its 
available again.


> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to re

[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-03 Thread Emilien Kenler (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837773#comment-13837773
 ] 

Emilien Kenler commented on COUCHDB-1946:
-

Hi,

I tried to do a npm replica too.

I set up a new VM with 1GB RAM on Debian Wheezy with CouchDB 1.2.0 from 
repository and Erlang R15B01 (erts-5.9.1).
I create the replica by using the following command:
~curl -X POST http://127.0.0.1:5984/_replicate -d 
'{"source":"http://isaacs.iriscouch.com/registry/";, "target":"registry", 
"continuous":true, "create_target":true}' -H "Content-Type: application/json"~

In about 12h, I obtained a working replica (127GB).
Then I saw the Marc's problem, and noticed that npm is on CouchDB 1.5.0.
I decided to run a new replica on a new VM with the same configuration exept 
CouchDB in 1.5.0.

I followed the instructions in the documentation and on the wiki, and I 
installed the libmozjs185-cloudant{,-dev} packages. 
http://wiki.apache.org/couchdb/Installing_on_Debian

I start the replication from my working CouchDB 1.2.0 server to the new CouchDB 
1.5.0 server.
In few hours (about 36GB replicated), the replication stopped and it crashed 
(Out of memory in syslog).

I upgraded my VM to 8GB of memory, but nothing changes.

The error logs are here: https://gist.github.com/MiLk/a1abd11aedb24c20c9d5



> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-02 Thread Marc Trudel (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837229#comment-13837229
 ] 

Marc Trudel commented on COUCHDB-1946:
--

curl $COUCH/registry | python -mjson.tool
```javascript
{
"committed_update_seq": 51009,
"compact_running": false,
"data_size": 39439860666,
"db_name": "registry",
"disk_format_version": 6,
"disk_size": 47768499249,
"doc_count": 38858,
"doc_del_count": 3886,
"instance_start_time": "1386033114907901",
"purge_seq": 0,
"update_seq": 51009
}
```

Logs are here attached. This is a log from startup to crash, so there is 
certainly a bit of noise in there.

Extra note: as download goes, until I hit the 40G mark I replicate at 10Mbps. 
Then it goes down slowly until it hits a bit below 200Kbps

I am downloading a .couch file copy from someone who already has a server up 
and running. Will give a try to this too, and will report any findings...

> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
> Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB

2013-12-02 Thread Dave Cottlehuber (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836943#comment-13836943
 ] 

Dave Cottlehuber commented on COUCHDB-1946:
---

Hi Marc,

thanks for reporting this. We'll need a bit more info to make headway on this; 
the error message you see is a relatively unimportant part of the Erlang VM.

If you can start your instance up, I'm interested in a couple of specific 
things:

what the last update sequence number of your registry copy is:

export COUCH=http://localhost:5984
curl $COUCH/registry

should return a JSON blob like this:

{
"committed_update_seq": 7305,
"compact_running": false,
"data_size": 2040934117,
"db_name": "registry",
"disk_format_version": 6,
"disk_size": 3444014974,
"doc_count": 5208,
"doc_del_count": 754,
"instance_start_time": "1386014605129594",
"purge_seq": 0,
"update_seq": 7305
}


Also, if you can make the couch.log file available (privately is fine, 
d...@apache.org GPG key for optional signing 
http://people.apache.org/~dch/KEYS) we might find something more enlightening.

Previous occurences of the error you mentioned have all been related to 
insufficient memory. FWIW I'm running replication < 2GB RAM atm on a gce small 
instance, 2 cores, and that's cpu bound only.

https://couchdb.readthedocs.org/en/latest/config/replicator.html

Current parameters can be seen  via GET /_config/replicator, and you can do 
updates via PUT or also via futon's configuation interface. This will avoid the 
need to restart couchdb, which is only required if you edit the ini file by 
hand.

reduce:
worker_batch_size to ease RAM pressure
worker_processes for disk & network IO

Other Couch folk report reducing these right down has 

Bear in mind that a heavy load is borne by the central registry atm, there may 
not be a great deal you can influence on your local node wrt to performance.

You can try reducing the parallelisation of replication;



> Trying to replicate NPM grinds to a halt after 40GB
> ---
>
> Key: COUCHDB-1946
> URL: https://issues.apache.org/jira/browse/COUCHDB-1946
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Marc Trudel
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1#6144)