[ https://issues.apache.org/jira/browse/COUCHDB-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676584#comment-13676584 ]
Eli Stevens commented on COUCHDB-1817: -------------------------------------- Of course, the next run had an issue. :( Here's the stack trace from the first server 500 for the data run: https://gist.github.com/wickedgrey/ef7b59e6b0be7ec47692 Here's the stack trace from trying to *read* the view in question: https://gist.github.com/wickedgrey/d6cb6e0fa2190882977f We've copied the DB files off, and are ready to send them to someone who would like to debug the issue. > OS Process Error <0.21247.103> :: {os_process_error, {exit_status,0}} > --------------------------------------------------------------------- > > Key: COUCHDB-1817 > URL: https://issues.apache.org/jira/browse/COUCHDB-1817 > Project: CouchDB > Issue Type: Bug > Components: JavaScript View Server > Reporter: Eli Stevens > Attachments: couchdb__couchdb_files.png, > couchdb__httpd_status_codes.png, couchdb_mem.png, loadavg.png, memory2.png > > > We have started seeing errors crop up in our application that we have not > seen before, and we're at a loss for how to start debugging it. > [~dch] Said that we might look into system resource limits, so we started > collecting all of the output from _stats into RRD (along with memory, load, > etc. that we were already collecting), but nothing is jumping out at us as > obviously problematic. > We can semi-reliably reproduce the problem, but it's far from a minimal test > case (basically, we load up several large chunks of data, and then halfway > through the processing run, we get the error). The error doesn't seem to > happen if we load up each chunk by itself. > The DB in question has about 100 docs in it, none particularly large (nothing > over a couple KB would be my guess), with a couple hundred MB in attachments. > 10ish design docs, coffeescript. In general, there isn't anything that > seems obviously resource intensive. > We have seen this issue on 1.2.0, 1.2.1, and we're working on getting a > machine with 1.3.0 set up (the PPA we'd been using hasn't been updated yet). > Ubuntu 12.04, spinning disk, etc. The system is under load when it happens, > but the load isn't more than 1.5x the number of cores. I don't have disk IO > numbers at hand, but I'd be surprised if that was being strained. > Error as it appears in couch.log: > https://gist.github.com/wickedgrey/e7fd3fc14b6d43e95564 > The design doc in question: > https://gist.github.com/wickedgrey/db41b0c3c75a590e2109 > An example document: https://gist.github.com/wickedgrey/a8422aab261ddd2ce4fe > We have some preliminary evidence that the problem persists after the system > goes quiet, but we're not certain. > Either CouchDB isn't handling things correctly, in which case this bug is > "prz fix" or we're doing something wrong (hitting a resource limit, or > something), in which case this bug is "prz make the error message more > informative". > Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira