Following up on this. After moving to real hardware my view index time for the same data set dropped from 25 minutes to 6 minutes, so definitely was a factor. If there any other optimizations I can make I'd love to know what they are. Thanks.
On Thu, Jul 3, 2008 at 9:35 AM, Brad King <[EMAIL PROTECTED]> wrote: > That would be fantastic, but it sounds like other users are seeing > performance similar to what I see. When you say tuning and > optimizations, are you talking about code changes in future versions > of couchdb or parameters we can change now? VM is definitely a > variable. I probably should try this out on real hardware too and > compare. > > On Wed, Jul 2, 2008 at 7:30 PM, Damien Katz <[EMAIL PROTECTED]> wrote: >> This sounds really slow, like somethings wrong. 25 minutes to process 300k >> means ~500 docs sec, or each document takes 2ms. That's a really long time >> CPU wise. >> >> Assuming it's not another VM bug, we should be able about to get that down >> to under minute with some tuning, and probably closer to 10 secs after >> serious optimizations. >> >> -Damien >> >> >> On Jul 2, 2008, at 6:28 PM, Chris Anderson wrote: >> >>> On Wed, Jul 2, 2008 at 3:08 PM, Paul Davis <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> I'd have to go back and double check, but off the top of my head 25 >>>> min for 300K docs seems about like what I was getting. Ie, not orders >>>> of magnitude slower or anything. >>> >>> In my experience, views generate about 1/2 as fast as that, if not >>> more slowly. My views are often quite complex with a lot of internal >>> looping and multiple emits, so that probably explains it. In short, >>> the times you're reporting seem reasonable. >>> >>> The bottleneck (based on my extremely unscientific use of top) doesn't >>> seem to be the view server, but rather CouchDB's beam process, which >>> as I understand it, is busy sorting the results as they come back from >>> the view server. So the quickest route to parallelizing this may be to >>> manually partition your data across CouchDB instances, generate the >>> views, and query them in parallel, merging the results in your >>> application. >>> >>> I don't actually plan to do all that work until my insert rate >>> eclipses CouchDB's view generation speed. :) >>> >>> Once upon a time there was a feature to return the available results >>> of a view, even while generation is still occurring. The feature has >>> fallen by the wayside, and it would be non-trivial to turn it back on, >>> according to Damien on IRC. Maybe if it would be useful to enough >>> people, we'll see it again. >>> >>> -- >>> Chris Anderson >>> http://jchris.mfdz.com >> >> >
