Re: Post-mortem

Kevin R. Coombes Thu, 10 May 2012 12:11:41 -0700

This is obviously my off-the-cuff immediate reaction. My firstobservation is that is tremendously useful to have this kind of detailedreal-world feedback on what is not working. Now we just need to decidewhat to do with it.

In the posted list of availability issues, even though there are fivebullets, there are only really two issues. (I'm not counting the bugslisted as the fifth bullet. They're fixed. Other bugs will crop up. AsCouch matures, one expects this issue to decrease in importance.) Insummary:


Issue 1:

Compaction fails silently. I've noticed this myself, and that isclearly something that has to be fixed. Failures will happensometimes. They shouldn't be silent. Especially when that kind ofsilent failure can eat a tremendous amount of disc space.


Issue 2:

Queries fail because of slow disk performance or while reindexing.Reindexing can fail, or can take an extraordinarily long time. Whileone view is being reindexed, all views *from that design document*fail. (The performance problem listed in the post, I think, comes downto the same thing. And so do most of the maintenance problems)

My experience with this issue is also similar, but I've added the phraseabout design documents. I have a couple of databases (one with 3.5Mdocuments) with multiple views defined in the same design document. Imade the mistake of trying to develop an application on this database.It was really painful every time I decided that one view needed to bechanged, and having to wait a couple of hours while all the views gotrebuilt. (For that purpose, I made a filtered version of the databasewith about 10K documents to use during view development.) Although Ihaven't tested it, I'm planning to move to a structure that puts oneview per design document to see if the other views remain usable whileone of them rebuilds. Since other databases remain usable, I expectthat this will work. It would be good to have advice somewhere on theCouch web site or wiki about how to organize views into documents, withmore details about how that might affect performance.

I don't know if it is possible to restructure the code to serve up otherviews from the same design document while one is being rebuilt. Andwhile I know about "stale=ok" or "stale=update_after", both of those arehard to use from web sites that access the database. since they requiremodifying the URL. And the "update_after" version only helps the firstuser, and just pushes the burden of waiting onto the next user. If youhave an active site with lots of users making queries, there is stillgoing to be a performance hit.

Perhaps the solution is to make it possible to configure the server (ona per-database level? or globally?) to *always* return stale views whilea view is rebuilding, and just mark them as stale. Perhaps anotherreserved word, either returning something like

    _stale : true | false
or
    _currency: "stale" | "current"

so the user or script could decide whether to use that data or waituntil the view is rebuilt (which begins suggesting other changes so youcan query if the view is done rebuilding, but I won't follow thattangent any furether at the moment).

I think the bottom line is that some serious attention needs to be paidto real-world performance issues, primarily centered around rebuildingviews and compaction.


    -- Kevin

On 5/10/2012 12:53 PM, Noah Slater wrote:

Guys,

What can we learn from this:

http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/


Thanks,

N

Re: Post-mortem

Reply via email to