Re: could cassandra be split into two parts?

Jun Rao Mon, 13 Apr 2009 09:01:52 -0700

It would be great if Cassandra can provide an abstract storage layer to
allow different pluggable storage engines. It seems that the abstract layer
has to define at least the following:
1. How to log data (if necessary)?
2. How to flush in-memory data?
3. How to compact data (if necessary)?
4. How to split data on a node?
5. How to do hinted handoff?
6. How to do conflict resolution and read repairs?


Are there anything else? Some of the above need more thoughts if a new
storage engine can expose new APIs. For example, if we plug in mysql and
exposes SQL like api, then even for simple queries like "select a+b from T
where row='row1'", it's not clear how read repairs should be done.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

[email protected]


Prashant Malik <[email protected]> wrote on 04/11/2009 07:40:05 PM:

> We have talked about this possibility for a while of plugging in various
> stores into cassandra ,it should definitely be possible.
> The top layer replication semantics would be handled by cassandra but
> components like conflict resolution etc might have to be built for each
> storage engine.
>
> Overall this would be a good split.
>
> - Prashant
>
> On Sat, Apr 11, 2009 at 6:58 PM, Ian Holsman <[email protected]> wrote:
>
> >
> > On 12/04/2009, at 11:44 AM, Sandeep Tata wrote:
> >
> >  Depends on what exactly you have in mind ...
> >>
> >> Almost all of the storage engine logic is in the db package. I don't
> >> think it would be too hard to make this pluggable so you could slide
> >> in your own DB, say based on Derby/MySQL/BDB etc... I can see how
> >> specialized implementations of the database part could be useful for
> >> different apps.
> >>
> >> Do you expect that the API will still be the same put/get style thrift
> >> API ? Or are you hoping to expose the additional abilities of the
> >> underlying db through the thrift API ? That makes the question more
> >> interesting (and complicated).
> >>
> >
> > initially it could be done via the put/get api, as most things would
work
> > that way (that I envisage).
> > but it would be nice to be able to be able to have custom API's
implemented
> > via thrift, and having cassandra
> > just route the api to their required server and just pass it through.
kind
> > of like a proxy.
> >
> > eg.
> > execSQL("cat", "select foo from bar where id='cat'") would use "cat" as
the
> > key and route that to the appropriate mysql engine.
> >
> > I would hope cassandra could handle the replication component of it,
not
> > mysql.
> >
> >
> >
> >
> >> On Sat, Apr 11, 2009 at 6:33 PM, Ian Holsman <[email protected]> wrote:
> >>
> >>> hey.
> >>>
> >>> I was wondering how feasible it would be to de-couple the P2P layer
of
> >>> cassandra from the storage engine.
> >>> I'd like to be able to plug in a non-column DB underneath, and use
the
> >>> DHT
> >>> layer of cassandra.
> >>>
> >>> Is this something anyone else has considered doing?
> >>> --
> >>> Ian Holsman
> >>> [email protected]
> >>>
> >>>
> >>>
> >>>
> >>>
> > --
> > Ian Holsman
> > [email protected]
> >
> >
> >
> >

Re: could cassandra be split into two parts?

Reply via email to