So I cobbled together the beginnings of a wiki page for a Riak and CouchDB comparison based on this thread. It's in a branch called "riak-couchdb-comparison" in the wiki repo.
https://github.com/basho/riak_wiki/tree/riak-couchdb-comparison As you'll see, it's _very_ sparse, and needs a lot of love and expanding from people who know couchdb better than me. I've also added a corresponding issue to the wiki repo: https://github.com/basho/riak_wiki/issues/issue/22 A little help fleshing this out would be much appreciated. Thanks, Mark On Fri, Jan 28, 2011 at 8:44 AM, Alexander Sicular <[email protected]> wrote: > Jamie hits a lot of points, ill cover sone, mind the overlap. > Couch has a built in indexing mechanism called a b-tree. Every view gets > it's own b-tree. The larger your data set the longer it takes to build the > b-tree. > Couch also knows when the b-tree was built in relation to data on disk via > internal sequence identifiers for each record. If your index is older than > data on disk it will update the b-tree with the new data. Riak has no native > indexing mechanism. If you want to reuse m/r results you need to stash them > somewhere. > > The subscription mechanism for changes is very nice but works because couch > is not a distributed system. It is a replicated system. That feature should > be worked into all rdbms's. Very handy. > The big win for couch is that you can arrange it in all sorts of interesting > topologies for replication. It can be used in offline systems that need to > sync when they get back online. > The couch guys are also working to scale couch down so you can use couch on > phones and other portable devices. > Like riak, couch is erlang based so you get all the erlang love. But unlike > riak couch is only accessible over http. Riak has protocolbuffers and native > access. > > Couch uses a wol (write only log) like bitcask (the default backend for > riak). They both need to be compacted to reclaim space. But unlike couch, > riak can also use other backends in the same cluster which gives you > flexibility. > And as hit on already but imho the biggest difference between couch and riak > is that in order to scale couch you need to implement a sharding layer to > split your data between multiple couches (see big couch, lounge). Riak is a > distributed system so all you need to do to scale riak is add more nodes. I > once tweeted something like "couch: divide and conquer. Riak: one ring to > rule them all." > Best, Alexander > @siculars on twitter > http://siculars.posterous.com > Sent from my iPhone > On Jan 28, 2011, at 8:29, Jamie Talbot <[email protected]> wrote: > > Hey Joshua, > I'm relatively new to Riak, but have done quite a bit of investigation into > CouchDB, so this is as much to confirm my own understanding as anything. > With that disclaimer out of the way, here's what I understand about the > two. > Couch has excellent database consistency - killing the server process dead > won't lose you any data, and recovering after a crash is very quick. Fault > tolerance I would say is Riak's biggest selling point, with the ability to > configure how many nodes can fail before results can no longer be returned > or written. You can kind of achieve fault tolerance with Couch by > load-balancing behind a proxy, but it's a kludge compared to the > fault-tolerance that is at the very heart of Riak. > Both CouchDB and Riak have map/reduce functionality available through REST, > using Erlang or Javascript. With Couch, querying the data can be > problematic though, especially on large sets of data as you have to > pre-define views of how you want to extract data and then wait for them to > be built. It's certainly not true that you can just choose any old design > and then figure things out later. Building views can take a long time - on > a few hundred million rows in my sample, it took a number of weeks to build > one relatively minor view (though hardware was quite limited). This makes > RAD with CouchDB difficult, and was a significant business risk. The upside > here is that once built, I could query 7 years of ISP data at year, month, > day, hour, minute granularity, across any cross-section of services in a > handful of milliseconds. This was incredible, and pretty addictive - it's > lightning fast, for very specific use cases. > The space requirements for Couch are enormous though, as updates and even > deletes increase the size of the DB, until compacted. Riak too will use > additional space to store duplicate copies of data on different nodes, to > provide fault tolerance, though from my experiments the overhead is nothing > like recent versions of CouchDB for my specific use cases. Your mileage > will vary greatly, based on your configuration of Riak and the > characteristics of your Couch views. > Riak, from what I understand is not currently particularly well-suited to > retrieving large amounts of data sequentially by key, but CouchDB works very > quickly here, as long as you have defined a suitable view. > Couch does bi-directional replication, though I did find that a little > flaky, sometimes dying for no reason. No data loss of course, and it did > eventually sync, but frustrating nonetheless. This was as of the previous > version. Riak does replication of data as part of its architecture, but if > you want to scale to multiple datacentres, you need the enterprise, non-free > version. > Scalability is hard with Couch, from what I can tell - certainly not the > ability just to add a new node for better performance like you can with > Riak. For me, this is a killer feature of Riak. > Couch has a nice subscription mechanism for changes to the database, which > allows you to set triggers and the like. Don't be fooled by the talk of > document versioning though - it is built in, but it is purely a mechanism > for the MVCC (replication and concurrency) mechanism to work and old > versions of documents are specifically removed whenever the database is > compacted. > This page has a high-level comparison of a number of NoSQL options, > including Riak and CouchDB, which was generally considered to be pretty > reasonable: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis > Hopefully that's a reasonable representation of the two systems. I will let > more seasoned pros correct and expand on the above as necessary! > Cheers, > Jamie. > PS: Hello, list! > On Fri, Jan 28, 2011 at 21:44, Joshua Partogi <[email protected]> wrote: >> >> Hi there. >> >> Has anyone here done any comparison between Riak and CouchDB? I am >> interested to see how similar and different Riak compared to Couch. If this >> can be added to the Riak wiki, I think it would be great for all of us here. >> >> Thanks heaps. >> >> Kind regards, >> Joshua. >> -- >> http://twitter.com/jpartogi >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
