[Amdatu-developers] Cassandra storage: dynamically create column families

Angelo van der Sijpt Wed, 20 Oct 2010 09:37:12 +0200

Hi List,

In the process of moving to Cassandra 0.7, Ivo and I hit upon some parts of the 
current Cassandra setup that could be overhauled a bit.


First part is the ColumnFamilyAvailable interface. This is used to signal to a 
user of the persistence manager that a given column family is now ready for 
use. We propose to replace this with service properties on the 
CassandraPersistenceManager. The current mechanism creates a PersistenceManager 
service for each Keyspace, and puts that in the service properties. If we add 
the column families to that, as a String[], clients can filter on something 
like &((KS=MyKeyspace)(KF=one)(KF=two)), where the persistence service has 
properties like ["KS" => "MyKeySpace", "KF" => new String[] {"one", "two", 
"seven"}]



So far so good, this cleans up the interface quite a bit. Still, we have the 
matter of the CassandraColumnFamilyProvider, which feels a little weird. Also, 
who is responsible for actually creating the column families we use? When can 
we remove a services' data?
With this, we should note that Cassandra 0.7 has a clean API for adding and 
removing (super)columns and column families, which we currently expose.

We came up with a number of possible alternatives, all of which are open for 
debate.


1. Current solution
As a persistence user, you publish a service that describes the column families 
you need. You then wait until a persistence manager with the correct properties 
comes along.

1a. Current solution, but with better dependency manager integration
We can take away the burden of registering a ColumnFamilyProvider by building a 
customer DependencyManager dependency, something like
.add(createCassandraDependency("myKeyspace")
  .add(createColumnFamily("LaFamilia")
    .add(createColumn("one", "int"))
    .add(createColumn("two", "string"))))

The main problem with this approach is, when can we remove data (if we remove 
data at all)? We can't just remove it all when the service goes away, since its 
bundle might be in the process of being updated.


2. Manifest headers
To get around the problem of removing the data when the service goes away, we 
could use a more static solution: add information about the required columns to 
the manifest of the bundle that wants to use it. We can still have the 
ColumnListener, but this time it watches for new bundles being installed, 
parses its manifest, and does its magic in much the same way that Pax Web picks 
up new wars. This way, we _only_ remove a bundles data when the bundle is 
explicitly removed, not when it is being updated.
The downside of this approach is that it relies on rather static data, making 
it impossible to add new column families (for, for instance, a new tenant) 
without deploying new bundles.


These questions are influenced by the way we want to use our storage. Is 
storage a 'given' for some application, or is the application allowed to mess 
around with it? How do we resolve disputes, i.e. two services that reuse the 
same keyspace and column family name, but with different definitions?


Any great insights?

Angelo

[Amdatu-developers] Cassandra storage: dynamically create column families

Reply via email to