On 5/12/07, Matt S Trout <[EMAIL PROTECTED]> wrote:
On Sat, May 12, 2007 at 03:31:12AM +0100, Dave Cardwell wrote:
> Hello list.
>
> I'd like to solicit your thoughts on the appropriate architecture of a
> DBIx::Class module for caching objects, similar to the functionality
> provided by Data::ObjectDriver [1].
>
> mst has made me aware that there are several existing, private
> implementations so I would be particularly interested in those
> developers' input on authoring a solution for general release.
I think we actually want to look at more than one layer of caching :)
A resultset plugin that allows you to share the cached result of a particular
query across processes would be useful.
The D::OD cache functionality requires two things -
(1) An ability to cache fetches by PK
(2) An ability to effectively expire caches on change
I think we can probably achieve both by indirecting both result and resultset
operations via the resultsource object - it's the last thing that really
understands the PKs, uniques etc. (the storage object doesn't really and
I think shouldn't). That would then allow the source to fill caches "on the
way through" - and also to make simple fetches only fetch the primary key
and then fill results either from cache or via a 'pk IN (...)' second select
(which is admittedly a gamble that you mostly hit cache but the idea here is
that you -do- mostly hit cache). Plus when updates occur the resultsource can
clear the caches (and in some cases update them) appropriately.
While I think caching at the DBIx::Class level will be useful for some
people, I would think that a lot of users would leave it turned off
for consistency reasons. Even if you're invalidating the cache
locally on update/delete, that's local to one process. Caches in
other [ithreads, processes, servers, datacenters] won't be
invalidated, and you get inconsistent views of the data.
What would be more interesting to me would be a genericized interface
between DBIx::Class sources and memcached, so that one can just "turn
on" memcached support and give it a few config parameters about where
the memcached servers are, etc. This solves the cache coherency issue
at the [ithreads/processes/servers] level, and people sharing
databases across remote datacenters of course can't use it or need to
come up with something better (as memcached across a WAN probably
doesn't make much sense in most scenarios).
The second thing we want to steal from D::OD is the ability to distribute
fetches across partitioned databases. I'm currently torn as to whether this
is better happening at the source or storage level - I -think- we probably
want to put this logic in the resultsource as well, since the choice of
partition is linked tightly into a level of data definition that again the
storage doesn't need to know about.
My thought would be to have a composite source that talks to multiple
underlying source objects, one per partition, and for those to refer back to
a partition schema object with an appropriate storage object.
I think that sounds like a sane plan. I guess partitioned data needs
to not have relationships, or needs to keep relationships local to the
partition (like, perhaps you have no inter-user relationships, and the
data of all other tables has an FK to the user, so you partition on
username). I think it would be extremely difficult for us to try to
emulate joins across partitions.
-- Brandon
_______________________________________________
List: http://lists.rawmode.org/cgi-bin/mailman/listinfo/dbix-class
Wiki: http://dbix-class.shadowcatsystems.co.uk/
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/trunk/DBIx-Class/
Searchable Archive: http://www.mail-archive.com/[email protected]/