Re: Confused handling of HEAD requests
On Wed, Jul 20, 2011 at 10:36 PM, Travis Jensen travis.jen...@gmail.comwrote: I'll dig into the archives and see if I can find the previous discussion on the topic. I might have to put an uber-ugly hack in for my use in the meantime. I don't mind spending time doing the right thing as long as I know what the right thing is. I dug into the source archive to find out when the mapping happened. March, 2008, when Mochiweb was incorporated (or updated or something). The third issue in Jira was HEAD not being supported and was resolved as part of that checkin. No other communications about this specific topic appear to exist. Unfortunately, that would imply it wasn't a fix for something else, but rather the way CouchDB will handle HEAD. Now to decide how to deal with this for my project... tj -- *Travis Jensen* *** *Read the Software Maven @ http://softwaremaven.innerbrane.com/ Read my LinkedIn profile @ http://www.linkedin.com/in/travisjensen Read my Twitter mumblings @ http://twitter.com/SoftwareMaven Send me email @ travis.jen...@gmail.com **What kind of guy calls himself the Software Maven???**
[jira] [Commented] (COUCHDB-1229) _update handler doesn't support slashes in doc _id
[ https://issues.apache.org/jira/browse/COUCHDB-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068872#comment-13068872 ] Robert Newson commented on COUCHDB-1229: After the first / to separate the database from the document, there shouldn't be any need to treat / characters as being special. Well, except for attachment urls, surely? /db_name/doc_id/attachment_name _update handler doesn't support slashes in doc _id -- Key: COUCHDB-1229 URL: https://issues.apache.org/jira/browse/COUCHDB-1229 Project: CouchDB Issue Type: Bug Components: HTTP Interface Affects Versions: 1.1 Reporter: Simon Leblanc Labels: URI, id, update Let's say you have: - a doc with _id foo/bar - a show function named baz - a update function named baz Then _show/baz/foo/bar is valid but _update/baz/foo/bar is not. Only _update/baz/foo%2Fbar works. This is particularly annoying when you want to rewrite /something/* to _update/baz/foo/* (rewriting /something/* to _show/baz/foo/* works perfectly). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind
[ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068874#comment-13068874 ] Robert Newson commented on COUCHDB-1226: I've definitely seen this before in production. Unfortunately it's very hard to determine a cause. I started to suspect the Erlang VM itself. In my case it tried to allocate slightly more RAM than the machine had in total (so as not to keep you in suspense, it failed to do so). Replication causes CouchDB to crash. I *suspect* a memory leak of some kind Key: COUCHDB-1226 URL: https://issues.apache.org/jira/browse/COUCHDB-1226 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.1 Environment: Gentoo Linux, CouchDB built using standard ebuild. Rebuilt July 2011. Reporter: James Marca Attachments: topcouch.log When replicating databases (pull replication), CouchDB will silently crash. I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies. For the crashing server, the log on debug doesn't seem very helpful. It says (with manually scrubbed server address): [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 16:23:20 GMT] [info] [0.10032.0] starting new replication 431a3f5bae52a6b27da72e42dc7b9fe3+create_target at 0.10054.0 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 1 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #1 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 2 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #2 [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 10 [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #10 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #14 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 14 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 20 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #20 [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [0.10054.0] target doesn't need a full commit [Mon, 18 Jul 2011 16:23:36 GMT] [info] [0.10054.0] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882 at source update_seq 20 Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine. Again, I scrubbed my server addresses in this log snippet.: [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3562.0] 'POST' /_replicate {1,1} from 128.*.*.* Headers: [{'Authorization',Basic amFtZXM6bWdpY24wbWIzcg==}, {'Connection',close}, {'Content-Type',application/json}, {'Host',***[pullserver]***.edu}, {'Transfer-Encoding',chunked}] [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3562.0] OAuth Params: [] [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3580.0] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3580.0] found a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 17:22:53 GMT] [info] [0.3562.0] starting new replication 431a3f5bae52a6b27da72e42dc7b9fe3+create_target at 0.3580.0 [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [0.3595.0] missing_revs updating committed seq to 22 [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [0.83.0] New task status for 431a3f:
Re: Confused handling of HEAD requests
Would any of this go away if we'd finally switched to Webmachine? Cheers Jan -- On 21 Jul 2011, at 02:09, Paul Davis wrote: On Wed, Jul 20, 2011 at 6:32 PM, Randall Leeds randall.le...@gmail.com wrote: On Wed, Jul 20, 2011 at 15:20, Paul Davis paul.joseph.da...@gmail.com wrote: On Wed, Jul 20, 2011 at 5:15 PM, Randall Leeds randall.le...@gmail.com wrote: On Wed, Jul 20, 2011 at 15:09, Paul Davis paul.joseph.da...@gmail.com wrote: On Wed, Jul 20, 2011 at 5:03 PM, Travis Jensen travis.jen...@gmail.com wrote: couch_httpd.erl seems to be confused about what it wants to do with HEAD requests. On the one hand, it supports catching {http_head_abort, Resp} and will throw that in start_response/3 and start_response_length/4 if your method is set to HEAD. On the other hand, it sets all HEAD requests to GET, so no handler can ever know a HEAD request was made (instead, it lets Mochiweb strip the body). I can appreciate the simplicity of the latter, but the schizophrenic behavior seems odd. I've also got a custom handler that would really like to know if it is HEAD or GET (generating the body takes a lot of CPU, but I know its length because I store it in a document). First question: should Couch really set all HEAD requests to GET? (Personally, I think not) Second question: does anybody know how bad it would be to remove that HEAD - GET mapping? It would be bad since a lot of the handlers specifically match against the method being GET. I have a ticket open to do smarter things with HEAD in general, especially as it relates to caching proxies and ETags: https://issues.apache.org/jira/browse/COUCHDB-941 It's something we should definitely set about fixing eventually, but I don't know what the priority is. Cheers. tj -- *Travis Jensen* *** *Read the Software Maven @ http://softwaremaven.innerbrane.com/ Read my LinkedIn profile @ http://www.linkedin.com/in/travisjensen Read my Twitter mumblings @ http://twitter.com/SoftwareMaven Send me email @ travis.jen...@gmail.com **What kind of guy calls himself the Software Maven???** I don't have the answer at the tip of my fingers, but IIRC there was a specific interaction that we had to do that so that something else didn't break. I wonder if its possible to tag the request with a special is actually a HEAD request thing so users can check. I don't like an is actually a HEAD request flag. Adding HEAD handlers is the right approach, but if we want to be lazy we could support a fallback to GET when we get a function_clause error trying to call the handler. Yeah, its definitely a hack. A fallback on function_clause would definitely be much cleaner I think. Only thing is I tend to wonder if there'd be a performance hit since our entire HTTP stack currently relies on HEAD - GET, which would be generate a lot of exception handling. It'd only occur when people do a HEAD request, so normal operation would be fine? Clearly we'd want to log a warning or something and start implementing all the HEAD responses properly. Though I think some libraries will use HEAD requests to check if they can short circuit some operations. No idea what sort of density that'd be and it'd obviously be fairly lib/usecase specific. Also true, if we change this, adding real HEAD responses would be useful. Granted in a webmachine world this would all Just Work. I'd search through the dev@ list for chatter on that mapping around the time it was made. I'm pretty sure there was a thread that we did some discussion on that.
[jira] [Created] (COUCHDB-1230) Replication slows down over time
Replication slows down over time Key: COUCHDB-1230 URL: https://issues.apache.org/jira/browse/COUCHDB-1230 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.1, 1.0.2 Environment: Ubuntu 10.04, Reporter: Paul Hirst I have two databases which were replicated in the past, one is running 1.0.2. I shall call this the source database. The other is running 1.1.0, I shall call this the target database. The source and target are bidirectionally replicated using a push and pull replication from the target (using a couple of documents in the new _replicator database). The source database is in production and is getting changes applied to it from live systems. The target is only participating in replication and it's being used directly by any production systems. The database has about 50 million documents many of these will have been updated a handful of times. The database is about 500G after compaction, but the source database is currently at about 900G as it hasn't been compacted for a while. The databases were replicated in the past however this replication was torn down when the target was upgraded from 1.0.2 to 1.1.0. When replication was reenabled the system wasn't able to pick up were it left off and had to reenumerate all the documents again. This process initially started quickly but after a while ground to a halt such that the target actually stopped making progress against the source database. I found that restarting replication starts the process running again at a decent speed for a while. I did this by deleting and recreating the appropriate document in the _replicator database on the target. I have graphed the last_seq of the target database against time for about a day, noting when replication was manually restarted. I shall try to attach the graph if possible. It shows a clear improvement in replication speed after restarting replication. I previously witnessed this behaviour between 1.0.2 databases but didn't grab any stats at the time but I don't think it's a new problem. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (COUCHDB-1230) Replication slows down over time
[ https://issues.apache.org/jira/browse/COUCHDB-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Hirst updated COUCHDB-1230: Attachment: sequence_number.png This is the last sequence number of a replication target graphed over ~20 hours. It shows that restarting replication gives a speed boost and that after a while the speed diminishes. Replication slows down over time Key: COUCHDB-1230 URL: https://issues.apache.org/jira/browse/COUCHDB-1230 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.0.2, 1.1 Environment: Ubuntu 10.04, Reporter: Paul Hirst Attachments: sequence_number.png I have two databases which were replicated in the past, one is running 1.0.2. I shall call this the source database. The other is running 1.1.0, I shall call this the target database. The source and target are bidirectionally replicated using a push and pull replication from the target (using a couple of documents in the new _replicator database). The source database is in production and is getting changes applied to it from live systems. The target is only participating in replication and it's being used directly by any production systems. The database has about 50 million documents many of these will have been updated a handful of times. The database is about 500G after compaction, but the source database is currently at about 900G as it hasn't been compacted for a while. The databases were replicated in the past however this replication was torn down when the target was upgraded from 1.0.2 to 1.1.0. When replication was reenabled the system wasn't able to pick up were it left off and had to reenumerate all the documents again. This process initially started quickly but after a while ground to a halt such that the target actually stopped making progress against the source database. I found that restarting replication starts the process running again at a decent speed for a while. I did this by deleting and recreating the appropriate document in the _replicator database on the target. I have graphed the last_seq of the target database against time for about a day, noting when replication was manually restarted. I shall try to attach the graph if possible. It shows a clear improvement in replication speed after restarting replication. I previously witnessed this behaviour between 1.0.2 databases but didn't grab any stats at the time but I don't think it's a new problem. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (COUCHDB-1230) Replication slows down over time
[ https://issues.apache.org/jira/browse/COUCHDB-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Hirst updated COUCHDB-1230: Description: I have two databases which were replicated in the past, one is running 1.0.2. I shall call this the source database. The other is running 1.1.0, I shall call this the target database. The source and target are bidirectionally replicated using a push and pull replication from the target (using a couple of documents in the new _replicator database). The source database is in production and is getting changes applied to it from live systems. The target is only participating in replication and isn't being used directly by any production systems. The database has about 50 million documents many of these will have been updated a handful of times. The database is about 500G after compaction, but the source database is currently at about 900G as it hasn't been compacted for a while. The databases were replicated in the past however this replication was torn down when the target was upgraded from 1.0.2 to 1.1.0. When replication was reenabled the system wasn't able to pick up were it left off and had to reenumerate all the documents again. This process initially started quickly but after a while ground to a halt such that the target actually stopped making progress against the source database. I found that restarting replication starts the process running again at a decent speed for a while. I did this by deleting and recreating the appropriate document in the _replicator database on the target. I have graphed the last_seq of the target database against time for about a day, noting when replication was manually restarted. I shall try to attach the graph if possible. It shows a clear improvement in replication speed after restarting replication. I previously witnessed this behaviour between 1.0.2 databases but didn't grab any stats at the time but I don't think it's a new problem. was: I have two databases which were replicated in the past, one is running 1.0.2. I shall call this the source database. The other is running 1.1.0, I shall call this the target database. The source and target are bidirectionally replicated using a push and pull replication from the target (using a couple of documents in the new _replicator database). The source database is in production and is getting changes applied to it from live systems. The target is only participating in replication and it's being used directly by any production systems. The database has about 50 million documents many of these will have been updated a handful of times. The database is about 500G after compaction, but the source database is currently at about 900G as it hasn't been compacted for a while. The databases were replicated in the past however this replication was torn down when the target was upgraded from 1.0.2 to 1.1.0. When replication was reenabled the system wasn't able to pick up were it left off and had to reenumerate all the documents again. This process initially started quickly but after a while ground to a halt such that the target actually stopped making progress against the source database. I found that restarting replication starts the process running again at a decent speed for a while. I did this by deleting and recreating the appropriate document in the _replicator database on the target. I have graphed the last_seq of the target database against time for about a day, noting when replication was manually restarted. I shall try to attach the graph if possible. It shows a clear improvement in replication speed after restarting replication. I previously witnessed this behaviour between 1.0.2 databases but didn't grab any stats at the time but I don't think it's a new problem. Replication slows down over time Key: COUCHDB-1230 URL: https://issues.apache.org/jira/browse/COUCHDB-1230 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.0.2, 1.1 Environment: Ubuntu 10.04, Reporter: Paul Hirst Attachments: sequence_number.png I have two databases which were replicated in the past, one is running 1.0.2. I shall call this the source database. The other is running 1.1.0, I shall call this the target database. The source and target are bidirectionally replicated using a push and pull replication from the target (using a couple of documents in the new _replicator database). The source database is in production and is getting changes applied to it from live systems. The target is only participating in replication and isn't being used directly by any production systems. The database has about 50 million documents many of these will have been updated a handful of times. The database is about 500G
[jira] [Updated] (COUCHDB-1231) Replication times out sporadically
[ https://issues.apache.org/jira/browse/COUCHDB-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Markham updated COUCHDB-1231: -- Attachment: Couchdb Filtered replication target timeout .txt Couchdb Filtered replication source timeout .txt Logs snippets attached Replication times out sporadically --- Key: COUCHDB-1231 URL: https://issues.apache.org/jira/browse/COUCHDB-1231 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.0.2, 1.0.3 Environment: CentOS 5.6 64 bit, XFS HDD drive. Spidermonkey 1.9.2 or 1.7 Reporter: Alex Markham Labels: changes, replication, timeout Attachments: Couchdb Filtered replication source timeout .txt, Couchdb Filtered replication target timeout .txt We have a setup replicating 7 databases from a master to slave. 2 databases use filters. One of these databases (the infrequently updated one) is failing replication. We have a cronjob to poll replication once per minute, and these stack traces appear often in the logs. The network is a gigabit lan, or 2 vms on the same host (same result seen on both). The replication job is called by sshing into the target and then curling the source database to localhost Source - Target ssh TargetServer 'curl -sX POST -H content-type:application/json http://localhost:5984/_replicate -d {source:http://SourceServer:5984/DataBase,target:DataBase,continuous:true,filter:productionfilter/notProcessingJob}' changes_timeout is not defined in the ini files. Logs attached for stack traces on the source couch and the target couch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (COUCHDB-1231) Replication times out sporadically
Replication times out sporadically --- Key: COUCHDB-1231 URL: https://issues.apache.org/jira/browse/COUCHDB-1231 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.0.2, 1.0.3 Environment: CentOS 5.6 64 bit, XFS HDD drive. Spidermonkey 1.9.2 or 1.7 Reporter: Alex Markham Attachments: Couchdb Filtered replication source timeout .txt, Couchdb Filtered replication target timeout .txt We have a setup replicating 7 databases from a master to slave. 2 databases use filters. One of these databases (the infrequently updated one) is failing replication. We have a cronjob to poll replication once per minute, and these stack traces appear often in the logs. The network is a gigabit lan, or 2 vms on the same host (same result seen on both). The replication job is called by sshing into the target and then curling the source database to localhost Source - Target ssh TargetServer 'curl -sX POST -H content-type:application/json http://localhost:5984/_replicate -d {source:http://SourceServer:5984/DataBase,target:DataBase,continuous:true,filter:productionfilter/notProcessingJob}' changes_timeout is not defined in the ini files. Logs attached for stack traces on the source couch and the target couch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1231) Replication times out sporadically
[ https://issues.apache.org/jira/browse/COUCHDB-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069019#comment-13069019 ] Alex Markham commented on COUCHDB-1231: --- I should add that if I manually poll the changes url with the filter on using curl it seems to work fine (though not tested for long periods) http://SourceServer:5984/DataBase/_changes?filter=productionfilter/notProcessingJobstyle=all_docsheartbeat=1since=40034feed=continuous Replication times out sporadically --- Key: COUCHDB-1231 URL: https://issues.apache.org/jira/browse/COUCHDB-1231 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.0.2, 1.0.3 Environment: CentOS 5.6 64 bit, XFS HDD drive. Spidermonkey 1.9.2 or 1.7 Reporter: Alex Markham Labels: changes, replication, timeout Attachments: Couchdb Filtered replication source timeout .txt, Couchdb Filtered replication target timeout .txt We have a setup replicating 7 databases from a master to slave. 2 databases use filters. One of these databases (the infrequently updated one) is failing replication. We have a cronjob to poll replication once per minute, and these stack traces appear often in the logs. The network is a gigabit lan, or 2 vms on the same host (same result seen on both). The replication job is called by sshing into the target and then curling the source database to localhost Source - Target ssh TargetServer 'curl -sX POST -H content-type:application/json http://localhost:5984/_replicate -d {source:http://SourceServer:5984/DataBase,target:DataBase,continuous:true,filter:productionfilter/notProcessingJob}' changes_timeout is not defined in the ini files. Logs attached for stack traces on the source couch and the target couch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1231) Replication times out sporadically
[ https://issues.apache.org/jira/browse/COUCHDB-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069020#comment-13069020 ] Robert Newson commented on COUCHDB-1231: The 'Reason for termination == changes_timeout' points at the internal use of the timer module rather than anything network related. I took a quick look at how the time is set (and cancelled) and it looks ok. It does appear to be reset if a heartbeat is received even if there's a filter. Replication times out sporadically --- Key: COUCHDB-1231 URL: https://issues.apache.org/jira/browse/COUCHDB-1231 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.0.2, 1.0.3 Environment: CentOS 5.6 64 bit, XFS HDD drive. Spidermonkey 1.9.2 or 1.7 Reporter: Alex Markham Labels: changes, replication, timeout Attachments: Couchdb Filtered replication source timeout .txt, Couchdb Filtered replication target timeout .txt We have a setup replicating 7 databases from a master to slave. 2 databases use filters. One of these databases (the infrequently updated one) is failing replication. We have a cronjob to poll replication once per minute, and these stack traces appear often in the logs. The network is a gigabit lan, or 2 vms on the same host (same result seen on both). The replication job is called by sshing into the target and then curling the source database to localhost Source - Target ssh TargetServer 'curl -sX POST -H content-type:application/json http://localhost:5984/_replicate -d {source:http://SourceServer:5984/DataBase,target:DataBase,continuous:true,filter:productionfilter/notProcessingJob}' changes_timeout is not defined in the ini files. Logs attached for stack traces on the source couch and the target couch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind
[ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069072#comment-13069072 ] James Marca commented on COUCHDB-1226: -- I stared at this a lot last night. It always fails on the replicator, not the replicatee, regardless of whether it is push or pull. What happens is that (on restart) it writes our a check point, then the data gets pushed to the target, then it writes another checkpoint say 30 more than the first one. While doing that CPU is really high and RAM bobbles up and down, but generally drifts up. Then it crashes and I restart (this machine only has 4 gig, not 8) and it tries again. So it is something in the writing of the checkpoints, and possibly related to how big my docs are. Is there anyway to ask it to write checkpoints more often that once every 25 to 30 documents? Also, I am going to try this on my laptop, which is running slackware not gentoo, to see if maybe it is something in the tool chain or libraries that is causing the crash. Regards, James -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. Replication causes CouchDB to crash. I *suspect* a memory leak of some kind Key: COUCHDB-1226 URL: https://issues.apache.org/jira/browse/COUCHDB-1226 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.1 Environment: Gentoo Linux, CouchDB built using standard ebuild. Rebuilt July 2011. Reporter: James Marca Attachments: topcouch.log When replicating databases (pull replication), CouchDB will silently crash. I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies. For the crashing server, the log on debug doesn't seem very helpful. It says (with manually scrubbed server address): [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 16:23:20 GMT] [info] [0.10032.0] starting new replication 431a3f5bae52a6b27da72e42dc7b9fe3+create_target at 0.10054.0 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 1 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #1 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 2 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #2 [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 10 [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #10 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #14 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 14 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 20 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #20 [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [0.10054.0] target doesn't need a full commit [Mon, 18 Jul 2011 16:23:36 GMT] [info] [0.10054.0] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882 at source update_seq 20 Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine. Again, I scrubbed my server addresses in this log snippet.: [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3562.0] 'POST' /_replicate {1,1} from 128.*.*.* Headers: [{'Authorization',Basic amFtZXM6bWdpY24wbWIzcg==}, {'Connection',close}, {'Content-Type',application/json},
Re: Confused handling of HEAD requests
On Jul 21, 2011 2:33 AM, Jan Lehnardt j...@apache.org wrote: Would any of this go away if we'd finally switched to Webmachine? All of it. Cheers Jan -- On 21 Jul 2011, at 02:09, Paul Davis wrote: On Wed, Jul 20, 2011 at 6:32 PM, Randall Leeds randall.le...@gmail.com wrote: On Wed, Jul 20, 2011 at 15:20, Paul Davis paul.joseph.da...@gmail.com wrote: On Wed, Jul 20, 2011 at 5:15 PM, Randall Leeds randall.le...@gmail.com wrote: On Wed, Jul 20, 2011 at 15:09, Paul Davis paul.joseph.da...@gmail.com wrote: On Wed, Jul 20, 2011 at 5:03 PM, Travis Jensen travis.jen...@gmail.com wrote: couch_httpd.erl seems to be confused about what it wants to do with HEAD requests. On the one hand, it supports catching {http_head_abort, Resp} and will throw that in start_response/3 and start_response_length/4 if your method is set to HEAD. On the other hand, it sets all HEAD requests to GET, so no handler can ever know a HEAD request was made (instead, it lets Mochiweb strip the body). I can appreciate the simplicity of the latter, but the schizophrenic behavior seems odd. I've also got a custom handler that would really like to know if it is HEAD or GET (generating the body takes a lot of CPU, but I know its length because I store it in a document). First question: should Couch really set all HEAD requests to GET? (Personally, I think not) Second question: does anybody know how bad it would be to remove that HEAD - GET mapping? It would be bad since a lot of the handlers specifically match against the method being GET. I have a ticket open to do smarter things with HEAD in general, especially as it relates to caching proxies and ETags: https://issues.apache.org/jira/browse/COUCHDB-941 It's something we should definitely set about fixing eventually, but I don't know what the priority is. Cheers. tj -- *Travis Jensen* *** *Read the Software Maven @ http://softwaremaven.innerbrane.com/ Read my LinkedIn profile @ http://www.linkedin.com/in/travisjensen Read my Twitter mumblings @ http://twitter.com/SoftwareMaven Send me email @ travis.jen...@gmail.com **What kind of guy calls himself the Software Maven???** I don't have the answer at the tip of my fingers, but IIRC there was a specific interaction that we had to do that so that something else didn't break. I wonder if its possible to tag the request with a special is actually a HEAD request thing so users can check. I don't like an is actually a HEAD request flag. Adding HEAD handlers is the right approach, but if we want to be lazy we could support a fallback to GET when we get a function_clause error trying to call the handler. Yeah, its definitely a hack. A fallback on function_clause would definitely be much cleaner I think. Only thing is I tend to wonder if there'd be a performance hit since our entire HTTP stack currently relies on HEAD - GET, which would be generate a lot of exception handling. It'd only occur when people do a HEAD request, so normal operation would be fine? Clearly we'd want to log a warning or something and start implementing all the HEAD responses properly. Though I think some libraries will use HEAD requests to check if they can short circuit some operations. No idea what sort of density that'd be and it'd obviously be fairly lib/usecase specific. Also true, if we change this, adding real HEAD responses would be useful. Granted in a webmachine world this would all Just Work. I'd search through the dev@ list for chatter on that mapping around the time it was made. I'm pretty sure there was a thread that we did some discussion on that.
Re: Confused handling of HEAD requests
On Thu, Jul 21, 2011 at 7:05 PM, Randall Leeds randall.le...@gmail.com wrote: On Jul 21, 2011 2:33 AM, Jan Lehnardt j...@apache.org wrote: Would any of this go away if we'd finally switched to Webmachine? All of it. Though there are other problems like range handling and such things. I think HEAD could also work if we stop throwing and instead check states. - benoit
Re: Confused handling of HEAD requests
So is there anything I can do to help solve this right or should I just put my hack in place for now and call it good for the time being? tj On Thu, Jul 21, 2011 at 11:38 AM, Benoit Chesneau bchesn...@gmail.comwrote: On Thu, Jul 21, 2011 at 7:05 PM, Randall Leeds randall.le...@gmail.com wrote: On Jul 21, 2011 2:33 AM, Jan Lehnardt j...@apache.org wrote: Would any of this go away if we'd finally switched to Webmachine? All of it. Though there are other problems like range handling and such things. I think HEAD could also work if we stop throwing and instead check states. - benoit -- *Travis Jensen* *** *Read the Software Maven @ http://softwaremaven.innerbrane.com/ Read my LinkedIn profile @ http://www.linkedin.com/in/travisjensen Read my Twitter mumblings @ http://twitter.com/SoftwareMaven Send me email @ travis.jen...@gmail.com **What kind of guy calls himself the Software Maven???**
Re: Confused handling of HEAD requests
On Thu, Jul 21, 2011 at 1:01 PM, Travis Jensen travis.jen...@gmail.com wrote: So is there anything I can do to help solve this right or should I just put my hack in place for now and call it good for the time being? tj We like to refer to it as organically grown. :D But in seriousness, the fix for this weirdness will probably take quite a bit of effort which means it'll be awhile. I'd stick to the hack for now and then when we get a better HTTP layer built up it should obviate the need for it. On Thu, Jul 21, 2011 at 11:38 AM, Benoit Chesneau bchesn...@gmail.comwrote: On Thu, Jul 21, 2011 at 7:05 PM, Randall Leeds randall.le...@gmail.com wrote: On Jul 21, 2011 2:33 AM, Jan Lehnardt j...@apache.org wrote: Would any of this go away if we'd finally switched to Webmachine? All of it. Though there are other problems like range handling and such things. I think HEAD could also work if we stop throwing and instead check states. - benoit -- *Travis Jensen* *** *Read the Software Maven @ http://softwaremaven.innerbrane.com/ Read my LinkedIn profile @ http://www.linkedin.com/in/travisjensen Read my Twitter mumblings @ http://twitter.com/SoftwareMaven Send me email @ travis.jen...@gmail.com **What kind of guy calls himself the Software Maven???**
[jira] [Commented] (COUCHDB-1230) Replication slows down over time
[ https://issues.apache.org/jira/browse/COUCHDB-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069172#comment-13069172 ] Randall Leeds commented on COUCHDB-1230: If you have the ability and time to compile and test trunk in the same way, I would very much appreciate it. Filipe overhauled the replication code after 1.1 branched. It uses connection pooling much more sensibly and it's easier to reason about. Just make sure not to upgrade in place since the database will get upgraded in a way that is not backwards-compatible. Perhaps if you can set up a new target we'll know whether this is reproducible now or out-dated. Replication slows down over time Key: COUCHDB-1230 URL: https://issues.apache.org/jira/browse/COUCHDB-1230 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.0.2, 1.1 Environment: Ubuntu 10.04, Reporter: Paul Hirst Attachments: sequence_number.png I have two databases which were replicated in the past, one is running 1.0.2. I shall call this the source database. The other is running 1.1.0, I shall call this the target database. The source and target are bidirectionally replicated using a push and pull replication from the target (using a couple of documents in the new _replicator database). The source database is in production and is getting changes applied to it from live systems. The target is only participating in replication and isn't being used directly by any production systems. The database has about 50 million documents many of these will have been updated a handful of times. The database is about 500G after compaction, but the source database is currently at about 900G as it hasn't been compacted for a while. The databases were replicated in the past however this replication was torn down when the target was upgraded from 1.0.2 to 1.1.0. When replication was reenabled the system wasn't able to pick up were it left off and had to reenumerate all the documents again. This process initially started quickly but after a while ground to a halt such that the target actually stopped making progress against the source database. I found that restarting replication starts the process running again at a decent speed for a while. I did this by deleting and recreating the appropriate document in the _replicator database on the target. I have graphed the last_seq of the target database against time for about a day, noting when replication was manually restarted. I shall try to attach the graph if possible. It shows a clear improvement in replication speed after restarting replication. I previously witnessed this behaviour between 1.0.2 databases but didn't grab any stats at the time but I don't think it's a new problem. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind
[ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069302#comment-13069302 ] Filipe Manana commented on COUCHDB-1226: Thanks for testing and reporting James Replication causes CouchDB to crash. I *suspect* a memory leak of some kind Key: COUCHDB-1226 URL: https://issues.apache.org/jira/browse/COUCHDB-1226 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.1 Environment: Gentoo Linux, CouchDB built using standard ebuild. Rebuilt July 2011. Reporter: James Marca Attachments: topcouch.log When replicating databases (pull replication), CouchDB will silently crash. I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies. For the crashing server, the log on debug doesn't seem very helpful. It says (with manually scrubbed server address): [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 16:23:20 GMT] [info] [0.10032.0] starting new replication 431a3f5bae52a6b27da72e42dc7b9fe3+create_target at 0.10054.0 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 1 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #1 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 2 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #2 [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 10 [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #10 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #14 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 14 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 20 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #20 [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [0.10054.0] target doesn't need a full commit [Mon, 18 Jul 2011 16:23:36 GMT] [info] [0.10054.0] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882 at source update_seq 20 Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine. Again, I scrubbed my server addresses in this log snippet.: [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3562.0] 'POST' /_replicate {1,1} from 128.*.*.* Headers: [{'Authorization',Basic amFtZXM6bWdpY24wbWIzcg==}, {'Connection',close}, {'Content-Type',application/json}, {'Host',***[pullserver]***.edu}, {'Transfer-Encoding',chunked}] [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3562.0] OAuth Params: [] [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3580.0] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3580.0] found a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 17:22:53 GMT] [info] [0.3562.0] starting new replication 431a3f5bae52a6b27da72e42dc7b9fe3+create_target at 0.3580.0 [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [0.3595.0] missing_revs updating committed seq to 22 [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [0.83.0] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #22 [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [0.3595.0] missing_revs updating committed seq to 37 [Mon, 18 Jul 2011
[jira] [Created] (COUCHDB-1232) configure should check for spidermonkey version 1.8.5
configure should check for spidermonkey version 1.8.5 - Key: COUCHDB-1232 URL: https://issues.apache.org/jira/browse/COUCHDB-1232 Project: CouchDB Issue Type: Improvement Components: Build System Affects Versions: 1.2 Environment: Gentoo Linux Reporter: James Marca Priority: Minor configure does not require spidermonkey 1.8.5, but trunk (1.2) failed to build without it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1232) configure should check for spidermonkey version 1.8.5
[ https://issues.apache.org/jira/browse/COUCHDB-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069308#comment-13069308 ] James Marca commented on COUCHDB-1232: -- The make failed to find JS_NewObjectForConstructor configure should check for spidermonkey version 1.8.5 - Key: COUCHDB-1232 URL: https://issues.apache.org/jira/browse/COUCHDB-1232 Project: CouchDB Issue Type: Improvement Components: Build System Affects Versions: 1.2 Environment: Gentoo Linux Reporter: James Marca Priority: Minor configure does not require spidermonkey 1.8.5, but trunk (1.2) failed to build without it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (COUCHDB-1232) trunk does not build with spidermonkey 1.7
[ https://issues.apache.org/jira/browse/COUCHDB-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Leeds updated COUCHDB-1232: --- Description: Original bug report: configure does not require spidermonkey 1.8.5, but trunk (1.2) failed to build without it. Trunk should fall back to a slightly older spidermonkey and configure definitely fails if you don't have either. However, by popular demand I'm told I should make trunk compile with 1.7 again, so I will. I'm changing this ticket to track that. Thanks. was:configure does not require spidermonkey 1.8.5, but trunk (1.2) failed to build without it. Skill Level: Regular Contributors Level (Easy to Medium) Fix Version/s: 1.2 Assignee: Randall Leeds Remaining Estimate: 4h Original Estimate: 4h Summary: trunk does not build with spidermonkey 1.7 (was: configure should check for spidermonkey version 1.8.5) trunk does not build with spidermonkey 1.7 -- Key: COUCHDB-1232 URL: https://issues.apache.org/jira/browse/COUCHDB-1232 Project: CouchDB Issue Type: Improvement Components: Build System Affects Versions: 1.2 Environment: Gentoo Linux Reporter: James Marca Assignee: Randall Leeds Priority: Minor Fix For: 1.2 Original Estimate: 4h Remaining Estimate: 4h Original bug report: configure does not require spidermonkey 1.8.5, but trunk (1.2) failed to build without it. Trunk should fall back to a slightly older spidermonkey and configure definitely fails if you don't have either. However, by popular demand I'm told I should make trunk compile with 1.7 again, so I will. I'm changing this ticket to track that. Thanks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1232) trunk does not build with spidermonkey 1.7
[ https://issues.apache.org/jira/browse/COUCHDB-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069309#comment-13069309 ] Randall Leeds commented on COUCHDB-1232: I'll look into the JS_NewObjectForConstructor issue, though I haven't had any trouble myself building against a slightly older SM. trunk does not build with spidermonkey 1.7 -- Key: COUCHDB-1232 URL: https://issues.apache.org/jira/browse/COUCHDB-1232 Project: CouchDB Issue Type: Improvement Components: Build System Affects Versions: 1.2 Environment: Gentoo Linux Reporter: James Marca Assignee: Randall Leeds Priority: Minor Fix For: 1.2 Original Estimate: 4h Remaining Estimate: 4h Original bug report: configure does not require spidermonkey 1.8.5, but trunk (1.2) failed to build without it. Trunk should fall back to a slightly older spidermonkey and configure definitely fails if you don't have either. However, by popular demand I'm told I should make trunk compile with 1.7 again, so I will. I'm changing this ticket to track that. Thanks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind
[ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Leeds resolved COUCHDB-1226. Resolution: Fixed Fix Version/s: 1.2 Fixed by Filipe in the new replicator. I suspect the pipelining may have been building up too many requests on the target, but no matter. If someone wants to investigate further and try to fix for 1.1.1 that'd be cool, but it might be easier to backport the new replicator (if that's kosher). Replication causes CouchDB to crash. I *suspect* a memory leak of some kind Key: COUCHDB-1226 URL: https://issues.apache.org/jira/browse/COUCHDB-1226 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 1.1 Environment: Gentoo Linux, CouchDB built using standard ebuild. Rebuilt July 2011. Reporter: James Marca Fix For: 1.2 Attachments: topcouch.log When replicating databases (pull replication), CouchDB will silently crash. I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies. For the crashing server, the log on debug doesn't seem very helpful. It says (with manually scrubbed server address): [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10054.0] didn't find a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 16:23:20 GMT] [info] [0.10032.0] starting new replication 431a3f5bae52a6b27da72e42dc7b9fe3+create_target at 0.10054.0 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 1 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #1 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 2 [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #2 [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 10 [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #10 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #14 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 14 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.10070.0] missing_revs updating committed seq to 20 [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [0.83.0] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882: W Processed source update #20 [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [0.10054.0] target doesn't need a full commit [Mon, 18 Jul 2011 16:23:36 GMT] [info] [0.10054.0] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ - vdsdata/d12/2007/1210882 at source update_seq 20 Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine. Again, I scrubbed my server addresses in this log snippet.: [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3562.0] 'POST' /_replicate {1,1} from 128.*.*.* Headers: [{'Authorization',Basic amFtZXM6bWdpY24wbWIzcg==}, {'Connection',close}, {'Content-Type',application/json}, {'Host',***[pullserver]***.edu}, {'Transfer-Encoding',chunked}] [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3562.0] OAuth Params: [] [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3580.0] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [0.3580.0] found a replication log for vdsdata/d12/2007/1210882 [Mon, 18 Jul 2011 17:22:53 GMT] [info] [0.3562.0] starting new replication 431a3f5bae52a6b27da72e42dc7b9fe3+create_target at 0.3580.0 [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [0.3595.0] missing_revs updating committed seq to 22 [Mon, 18 Jul 2011 17:22:56 GMT] [debug]