Hi List,
In the process of moving to Cassandra 0.7, Ivo and I hit upon some parts of the
current Cassandra setup that could be overhauled a bit.
First part is the ColumnFamilyAvailable interface. This is used to signal to a
user of the persistence manager that a given column family is now ready for
use. We propose to replace this with service properties on the
CassandraPersistenceManager. The current mechanism creates a PersistenceManager
service for each Keyspace, and puts that in the service properties. If we add
the column families to that, as a String[], clients can filter on something
like &((KS=MyKeyspace)(KF=one)(KF=two)), where the persistence service has
properties like ["KS" => "MyKeySpace", "KF" => new String[] {"one", "two",
"seven"}]
So far so good, this cleans up the interface quite a bit. Still, we have the
matter of the CassandraColumnFamilyProvider, which feels a little weird. Also,
who is responsible for actually creating the column families we use? When can
we remove a services' data?
With this, we should note that Cassandra 0.7 has a clean API for adding and
removing (super)columns and column families, which we currently expose.
We came up with a number of possible alternatives, all of which are open for
debate.
1. Current solution
As a persistence user, you publish a service that describes the column families
you need. You then wait until a persistence manager with the correct properties
comes along.
1a. Current solution, but with better dependency manager integration
We can take away the burden of registering a ColumnFamilyProvider by building a
customer DependencyManager dependency, something like
.add(createCassandraDependency("myKeyspace")
.add(createColumnFamily("LaFamilia")
.add(createColumn("one", "int"))
.add(createColumn("two", "string"))))
The main problem with this approach is, when can we remove data (if we remove
data at all)? We can't just remove it all when the service goes away, since its
bundle might be in the process of being updated.
2. Manifest headers
To get around the problem of removing the data when the service goes away, we
could use a more static solution: add information about the required columns to
the manifest of the bundle that wants to use it. We can still have the
ColumnListener, but this time it watches for new bundles being installed,
parses its manifest, and does its magic in much the same way that Pax Web picks
up new wars. This way, we _only_ remove a bundles data when the bundle is
explicitly removed, not when it is being updated.
The downside of this approach is that it relies on rather static data, making
it impossible to add new column families (for, for instance, a new tenant)
without deploying new bundles.
These questions are influenced by the way we want to use our storage. Is
storage a 'given' for some application, or is the application allowed to mess
around with it? How do we resolve disputes, i.e. two services that reuse the
same keyspace and column family name, but with different definitions?
Any great insights?
Angelo