It would be great if Cassandra can provide an abstract storage layer to
allow different pluggable storage engines. It seems that the abstract layer
has to define at least the following:
1. How to log data (if necessary)?
2. How to flush in-memory data?
3. How to compact data (if necessary)?
4. How to split data on a node?
5. How to do hinted handoff?
6. How to do conflict resolution and read repairs?

Are there anything else? Some of the above need more thoughts if a new
storage engine can expose new APIs. For example, if we plug in mysql and
exposes SQL like api, then even for simple queries like "select a+b from T
where row='row1'", it's not clear how read repairs should be done.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

[email protected]


Prashant Malik <[email protected]> wrote on 04/11/2009 07:40:05 PM:

> We have talked about this possibility for a while of plugging in various
> stores into cassandra ,it should definitely be possible.
> The top layer replication semantics would be handled by cassandra but
> components like conflict resolution etc might have to be built for each
> storage engine.
>
> Overall this would be a good split.
>
> - Prashant
>
> On Sat, Apr 11, 2009 at 6:58 PM, Ian Holsman <[email protected]> wrote:
>
> >
> > On 12/04/2009, at 11:44 AM, Sandeep Tata wrote:
> >
> >  Depends on what exactly you have in mind ...
> >>
> >> Almost all of the storage engine logic is in the db package. I don't
> >> think it would be too hard to make this pluggable so you could slide
> >> in your own DB, say based on Derby/MySQL/BDB etc... I can see how
> >> specialized implementations of the database part could be useful for
> >> different apps.
> >>
> >> Do you expect that the API will still be the same put/get style thrift
> >> API ? Or are you hoping to expose the additional abilities of the
> >> underlying db through the thrift API ? That makes the question more
> >> interesting (and complicated).
> >>
> >
> > initially it could be done via the put/get api, as most things would
work
> > that way (that I envisage).
> > but it would be nice to be able to be able to have custom API's
implemented
> > via thrift, and having cassandra
> > just route the api to their required server and just pass it through.
kind
> > of like a proxy.
> >
> > eg.
> > execSQL("cat", "select foo from bar where id='cat'") would use "cat" as
the
> > key and route that to the appropriate mysql engine.
> >
> > I would hope cassandra could handle the replication component of it,
not
> > mysql.
> >
> >
> >
> >
> >> On Sat, Apr 11, 2009 at 6:33 PM, Ian Holsman <[email protected]> wrote:
> >>
> >>> hey.
> >>>
> >>> I was wondering how feasible it would be to de-couple the P2P layer
of
> >>> cassandra from the storage engine.
> >>> I'd like to be able to plug in a non-column DB underneath, and use
the
> >>> DHT
> >>> layer of cassandra.
> >>>
> >>> Is this something anyone else has considered doing?
> >>> --
> >>> Ian Holsman
> >>> [email protected]
> >>>
> >>>
> >>>
> >>>
> >>>
> > --
> > Ian Holsman
> > [email protected]
> >
> >
> >
> >

Reply via email to