Maybe some background first:

I'm working on code that does queries on large tables.  To be able to support 
the growth of these tables, we need to shard the tables to multiple database 
servers.  One way to shard this evenly, would be to shard per second of data: 
even seconds go to one database, odd seconds go to another.

We have a lot of code that creates minutely / hourly / daily aggregations (in 
Perl, the database just cannot do the kind of aggregations that we need) by 
querying these tables.

When moving to sharded tables, I would like to keep the changes to the code as 
minimal as possible.

By monkeypatching DBI::st, I've been able to reduce the code change to only 
having to add another loop construct to the code, so something like:

=============================
my %visitors;
my $values;
$visitors{ $values->[0] }= undef
  while $values= $sth->fetchrow_arrayref;
printf "Found %d unique visitors\n", scalar keys %visitors;
=============================

Becomes:

=============================
my %visitors;
my $values;
while ( $sth= $sth->next ) {
    $visitors{ $values->[0] }= undef
      while $values= $sth->fetchrow_arrayref;
}
printf "Found %d unique visitors\n", scalar keys %visitors;
=============================

Then I found out about "swap_inner_handles" and realised that *if* I could 
catch the depletion of the result set, I could do the $sth->next logic using 
swap_inner_handles.  And make it completely invisible to the outside world.


However, I've not been able to find a way to catch that state change.

I have tried:

  Stealing $sth->finish (on the assumption that that would be called when the 
result set depletes)

  Stealing DBI::st::STORE (on the assumption that the tie interface would be 
used to reset the Active attribute)

Neither of these work, unfortunately.


So I wonder, would it make sense to add a statement handle attribute like 
"last_record_seen" which would contain a code ref?  And have that code execute 
with the statement handle as its only parameter whenever the result set of a 
statement handle gets depleted, but before fetchrow_arrayref returns undef?  If 
that would exist, then it would simply be a matter of putting the 
swap_inner_handle logic in that code.

something like:

$sth->{last_record_seen}= sub {
    my ($old)= @_;
    $old->swap_inner_handle($new);  # wherever $new comes from, not important 
here
}

Then it would be possible to completely hide the sharding of the databases from 
our code.  Which would make life a lot easier on a lot of people now and in the 
future.



So, does that make sense?  Have I missed other avenues of action?

Thanks in advance for any thoughts on this!



Elizabeth Mattijsen

Reply via email to