[ 
https://issues.apache.org/jira/browse/COUCHDB-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980498#action_12980498
 ] 

Randall Leeds commented on COUCHDB-1023:
----------------------------------------

Didn't I do this work already and not notice any significant gains? I haven't 
looked to see if maybe you did it differently, but here's a version with an 
append_terms.

https://github.com/tilgovi/couchdb/tree/realbatchwrite

I also have branches and patches where I experimented with other ways of 
changing this code path. I do like calling term_to_binary before the 
gen_server:call to couch_file because it should avoid a copy operation, but you 
have to consider what other path you're slowing down as a result.

If it's not getting too complex for no gain my next thought would be to take 
the caching code you worked on before and use it to get a write-through cache 
that we flush asynchronously. The goal would be to let the updater do as much 
as possible up to the flush of the next commit group while the current one is 
being written. Something like this?

> Batching writes of BTree nodes (when possible) and in the DB updater
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-1023
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1023
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Filipe Manana
>
> Recently I started experimenting with batching writes in the DB updater.
> For a test of 100 writers of 1Kb documents for e.g., most often the updater 
> collects between 20 and 30 documents to write.
> Currently it does a file:write operation for each one. Not only this is 
> slower, but it implies more context switches and stressing the OS/filesystem 
> by allocating few blocks very often (since we use a pure file append write 
> mode). The same can be done in the BTree node writes.
> The following branch/patch, is an experiment of batching writes:
> https://github.com/fdmanana/couchdb/compare/batch_writes
> In couch_file there's a quick test method that compares the time taken to 
> write X blocks of size Y versus writing a single block of size X * Y.
> Example:
> Eshell V5.8.2  (abort with ^G)
> 1> Apache CouchDB 1.2.0aa777195-git (LogLevel=info) is starting.
> Apache CouchDB has started. Time to relax.
> [info] [<0.37.0>] Apache CouchDB has started on http://127.0.0.1:5984/
> 1> couch_file:test(1000, 30).
> multi writes of 30 binaries, each of size 1000 bytes, took 1920us
> batch write of 30 binaries, each of size 1000 bytes,  took 344us
> ok
> 2> 
> 2> couch_file:test(4000, 30).
> multi writes of 30 binaries, each of size 4000 bytes, took 2002us
> batch write of 30 binaries, each of size 4000 bytes,  took 700us
> ok
> 3> 
> One order of magnitude less is quite significant I would say.
> Lower response times are mostly noticeable when delayed_commits are set to 
> true.
> Running a writes only test with this branch gave me:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902e544
> While with trunk I got:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902eb50
> These tests were done on Linux with ext4 (and OTP R14B01).
> However I'm still not 100% sure if this worth applying to trunk.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to