[ 
https://issues.apache.org/jira/browse/COUCHDB-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676584#comment-13676584
 ] 

Eli Stevens commented on COUCHDB-1817:
--------------------------------------

Of course, the next run had an issue.  :(

Here's the stack trace from the first server 500 for the data run: 
https://gist.github.com/wickedgrey/ef7b59e6b0be7ec47692

Here's the stack trace from trying to *read* the view in question: 
https://gist.github.com/wickedgrey/d6cb6e0fa2190882977f

We've copied the DB files off, and are ready to send them to someone who would 
like to debug the issue.
                
> OS Process Error <0.21247.103> :: {os_process_error, {exit_status,0}}
> ---------------------------------------------------------------------
>
>                 Key: COUCHDB-1817
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1817
>             Project: CouchDB
>          Issue Type: Bug
>          Components: JavaScript View Server
>            Reporter: Eli Stevens
>         Attachments: couchdb__couchdb_files.png, 
> couchdb__httpd_status_codes.png, couchdb_mem.png, loadavg.png, memory2.png
>
>
> We have started seeing errors crop up in our application that we have not 
> seen before, and we're at a loss for how to start debugging it.
> [~dch] Said that we might look into system resource limits, so we started 
> collecting all of the output from _stats into RRD (along with memory, load, 
> etc. that we were already collecting), but nothing is jumping out at us as 
> obviously problematic.
> We can semi-reliably reproduce the problem, but it's far from a minimal test 
> case (basically, we load up several large chunks of data, and then halfway 
> through the processing run, we get the error).  The error doesn't seem to 
> happen if we load up each chunk by itself.
> The DB in question has about 100 docs in it, none particularly large (nothing 
> over a couple KB would be my guess), with a couple hundred MB in attachments. 
>  10ish design docs, coffeescript.  In general, there isn't anything that 
> seems obviously resource intensive.
> We have seen this issue on 1.2.0, 1.2.1, and we're working on getting a 
> machine with 1.3.0 set up (the PPA we'd been using hasn't been updated yet).  
> Ubuntu 12.04, spinning disk, etc.  The system is under load when it happens, 
> but the load isn't more than 1.5x the number of cores.  I don't have disk IO 
> numbers at hand, but I'd be surprised if that was being strained.
> Error as it appears in couch.log: 
> https://gist.github.com/wickedgrey/e7fd3fc14b6d43e95564
> The design doc in question: 
> https://gist.github.com/wickedgrey/db41b0c3c75a590e2109
> An example document: https://gist.github.com/wickedgrey/a8422aab261ddd2ce4fe
> We have some preliminary evidence that the problem persists after the system 
> goes quiet, but we're not certain.
> Either CouchDB isn't handling things correctly, in which case this bug is 
> "prz fix" or we're doing something wrong (hitting a resource limit, or 
> something), in which case this bug is "prz make the error message more 
> informative".
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to