Jamie hits a lot of points, ill cover sone, mind the overlap.

Couch has a built in indexing mechanism called a b-tree. Every view gets it's own b-tree. The larger your data set the longer it takes to build the b-tree.

Couch also knows when the b-tree was built in relation to data on disk via internal sequence identifiers for each record. If your index is older than data on disk it will update the b-tree with the new data. Riak has no native indexing mechanism. If you want to reuse m/r results you need to stash them somewhere.

The subscription mechanism for changes is very nice but works because couch is not a distributed system. It is a replicated system. That feature should be worked into all rdbms's. Very handy.

The big win for couch is that you can arrange it in all sorts of interesting topologies for replication. It can be used in offline systems that need to sync when they get back online.

The couch guys are also working to scale couch down so you can use couch on phones and other portable devices.

Like riak, couch is erlang based so you get all the erlang love. But unlike riak couch is only accessible over http. Riak has protocolbuffers and native access.

Couch uses a wol (write only log) like bitcask (the default backend for riak). They both need to be compacted to reclaim space. But unlike couch, riak can also use other backends in the same cluster which gives you flexibility.

And as hit on already but imho the biggest difference between couch and riak is that in order to scale couch you need to implement a sharding layer to split your data between multiple couches (see big couch, lounge). Riak is a distributed system so all you need to do to scale riak is add more nodes. I once tweeted something like "couch: divide and conquer. Riak: one ring to rule them all."

Best, Alexander

@siculars on twitter
http://siculars.posterous.com

Sent from my iPhone

On Jan 28, 2011, at 8:29, Jamie Talbot <[email protected]> wrote:

Hey Joshua,

I'm relatively new to Riak, but have done quite a bit of investigation into CouchDB, so this is as much to confirm my own understanding as anything. With that disclaimer out of the way, here's what I understand about the two.

Couch has excellent database consistency - killing the server process dead won't lose you any data, and recovering after a crash is very quick. Fault tolerance I would say is Riak's biggest selling point, with the ability to configure how many nodes can fail before results can no longer be returned or written. You can kind of achieve fault tolerance with Couch by load-balancing behind a proxy, but it's a kludge compared to the fault-tolerance that is at the very heart of Riak.

Both CouchDB and Riak have map/reduce functionality available through REST, using Erlang or Javascript. With Couch, querying the data can be problematic though, especially on large sets of data as you have to pre-define views of how you want to extract data and then wait for them to be built. It's certainly not true that you can just choose any old design and then figure things out later. Building views can take a long time - on a few hundred million rows in my sample, it took a number of weeks to build one relatively minor view (though hardware was quite limited). This makes RAD with CouchDB difficult, and was a significant business risk. The upside here is that once built, I could query 7 years of ISP data at year, month, day, hour, minute granularity, across any cross-section of services in a handful of milliseconds. This was incredible, and pretty addictive - it's lightning fast, for very specific use cases.

The space requirements for Couch are enormous though, as updates and even deletes increase the size of the DB, until compacted. Riak too will use additional space to store duplicate copies of data on different nodes, to provide fault tolerance, though from my experiments the overhead is nothing like recent versions of CouchDB for my specific use cases. Your mileage will vary greatly, based on your configuration of Riak and the characteristics of your Couch views.

Riak, from what I understand is not currently particularly well- suited to retrieving large amounts of data sequentially by key, but CouchDB works very quickly here, as long as you have defined a suitable view.

Couch does bi-directional replication, though I did find that a little flaky, sometimes dying for no reason. No data loss of course, and it did eventually sync, but frustrating nonetheless. This was as of the previous version. Riak does replication of data as part of its architecture, but if you want to scale to multiple datacentres, you need the enterprise, non-free version.

Scalability is hard with Couch, from what I can tell - certainly not the ability just to add a new node for better performance like you can with Riak. For me, this is a killer feature of Riak.

Couch has a nice subscription mechanism for changes to the database, which allows you to set triggers and the like. Don't be fooled by the talk of document versioning though - it is built in, but it is purely a mechanism for the MVCC (replication and concurrency) mechanism to work and old versions of documents are specifically removed whenever the database is compacted.

This page has a high-level comparison of a number of NoSQL options, including Riak and CouchDB, which was generally considered to be pretty reasonable: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Hopefully that's a reasonable representation of the two systems. I will let more seasoned pros correct and expand on the above as necessary!

Cheers,

Jamie.

PS: Hello, list!

On Fri, Jan 28, 2011 at 21:44, Joshua Partogi <[email protected]> wrote:
Hi there.

Has anyone here done any comparison between Riak and CouchDB? I am interested to see how similar and different Riak compared to Couch. If this can be added to the Riak wiki, I think it would be great for all of us here.

Thanks heaps.

Kind regards,
Joshua.
--
http://twitter.com/jpartogi

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to