> On 06/17/2008 12:57 PM, Brian Whitman wrote: > >> On Jun 17, 2008, at 3:11 PM, Brad Schick wrote: >> >> >>> I am seeing very poor write performance running CouchDB on Ubuntu Linux >>> 8.04. I built CouchDB from source about a week ago, and I am using the >>> Python wrapper to access it all on localhost. >>> >>> I am trying to add over 90,000 documents (each is about 400 bytes) and >>> finding that I can only add about 16 documents per second. And while >>> this is happening, my CPU is about 75% idle. >>> >> Are you using the update method in python (which uses _bulk_docs) ? >> You need the svn version of the python wrapper for it. >> >> On a slow machine (dual proc 1.8 GHz) using the python wrapper I added >> 100K docs 10K at a time, each with a random string, uuid, float and >> int in 180 seconds (roughly 550 documents/second.) One CPU was pegged >> at 100% with beam.smp during the update and the python client did not >> even rate on top during the update (it was just waiting for a response.) >> >> I couldn't increase my chunk size much past 10K -- couch would return >> a (very long) error if I tried 5 20K chunks for example. >> > I have the code from svn, but I haven't tried bulk uploading yet because > I've been primarily testing schema.py which currently only does single > document stores. Maybe I'll hook that up to bulk writes and see how it goes. > >
Thanks for the tip. I switched to bulk updates and I can now saturate the CPU and get a max of about 240 docs/second uploaded (and much of that is in the client). It would be nice to have a tunable write batching setting in future versions of couchdb. If anyone is interested, I rearranged the code in schema.py a bit so that schema classes derive from client.Document. This makes it easy to call db.update with a list of them. -Brad
