Hey Jonathan: On Tue, Sep 1, 2009 at 3:12 PM, Jonathan Ellis <[email protected]> wrote:
> The big win for Cassandra is that its p2p distribution model -- which > drives the consistency model -- means there is no single point of > failure. SPF can be mitigated by failover but it's really, really > hard to get all the corner cases right with that approach. Even > Google with their 3 year head start and huge engineering resources > still has trouble with that occasionally. (See e.g. > http://groups.google.com/group/google-appengine/msg/ba95ded980c8c179.) > > Its hard to answer the above -- No SPOF > failover because some corner cases will be missed as though P2P was without corners -- so I'll pass on it. > > + Cassandra does not have have a natural sharding notion as there is in > > HBase -- i.e. HBase Regions -- so hooking Cassandra to MapReduce is > awkward. > > Actually that's not a big deal -- the token ring is known, so you can > break up at a coarse granularity there, and each node has a sampling > of the keys stored on it thanks to the way the sstable indexing works, > so generating hadoop input regions is pretty easy. Jeff Hodges wrote > a proof of concept over at > https://issues.apache.org/jira/browse/CASSANDRA-342. > Thanks. Yeah, I'd read that issue before making the comment. It was my reading of the issue that provoked my 'awkward' comment. > > + The Cassandra fellas talk of their app being one ball of code only > whereas > > with HBase there is HDFS, ZooKeeper and then HBase itself (Apparently it > has > > less lines of code too). > > Opinions may differ, but I still think this is a huge win for > troubleshooting. > The parenthetical was to poke fun at what, IMO, is a silly guage for comparing very different projects. Go easy, St.Ack
