It would be great if Cassandra can provide an abstract storage layer to allow different pluggable storage engines. It seems that the abstract layer has to define at least the following: 1. How to log data (if necessary)? 2. How to flush in-memory data? 3. How to compact data (if necessary)? 4. How to split data on a node? 5. How to do hinted handoff? 6. How to do conflict resolution and read repairs?
Are there anything else? Some of the above need more thoughts if a new storage engine can expose new APIs. For example, if we plug in mysql and exposes SQL like api, then even for simple queries like "select a+b from T where row='row1'", it's not clear how read repairs should be done. Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 [email protected] Prashant Malik <[email protected]> wrote on 04/11/2009 07:40:05 PM: > We have talked about this possibility for a while of plugging in various > stores into cassandra ,it should definitely be possible. > The top layer replication semantics would be handled by cassandra but > components like conflict resolution etc might have to be built for each > storage engine. > > Overall this would be a good split. > > - Prashant > > On Sat, Apr 11, 2009 at 6:58 PM, Ian Holsman <[email protected]> wrote: > > > > > On 12/04/2009, at 11:44 AM, Sandeep Tata wrote: > > > > Depends on what exactly you have in mind ... > >> > >> Almost all of the storage engine logic is in the db package. I don't > >> think it would be too hard to make this pluggable so you could slide > >> in your own DB, say based on Derby/MySQL/BDB etc... I can see how > >> specialized implementations of the database part could be useful for > >> different apps. > >> > >> Do you expect that the API will still be the same put/get style thrift > >> API ? Or are you hoping to expose the additional abilities of the > >> underlying db through the thrift API ? That makes the question more > >> interesting (and complicated). > >> > > > > initially it could be done via the put/get api, as most things would work > > that way (that I envisage). > > but it would be nice to be able to be able to have custom API's implemented > > via thrift, and having cassandra > > just route the api to their required server and just pass it through. kind > > of like a proxy. > > > > eg. > > execSQL("cat", "select foo from bar where id='cat'") would use "cat" as the > > key and route that to the appropriate mysql engine. > > > > I would hope cassandra could handle the replication component of it, not > > mysql. > > > > > > > > > >> On Sat, Apr 11, 2009 at 6:33 PM, Ian Holsman <[email protected]> wrote: > >> > >>> hey. > >>> > >>> I was wondering how feasible it would be to de-couple the P2P layer of > >>> cassandra from the storage engine. > >>> I'd like to be able to plug in a non-column DB underneath, and use the > >>> DHT > >>> layer of cassandra. > >>> > >>> Is this something anyone else has considered doing? > >>> -- > >>> Ian Holsman > >>> [email protected] > >>> > >>> > >>> > >>> > >>> > > -- > > Ian Holsman > > [email protected] > > > > > > > >
