On Mar 14, 2012, at 1:35 PM, Elizabeth Mattijsen wrote:
> Maybe some background first:
>
> I'm working on code that does queries on large tables. To be able to support
> the growth of these tables, we need to shard the tables to multiple database
> servers. One way to shard this evenly, would be to shard per second of data:
> even seconds go to one database, odd seconds go to another.
>
> We have a lot of code that creates minutely / hourly / daily aggregations (in
> Perl, the database just cannot do the kind of aggregations that we need) by
> querying these tables.
>
> When moving to sharded tables, I would like to keep the changes to the code
> as minimal as possible.
>
> By monkeypatching DBI::st, I've been able to reduce the code change to only
> having to add another loop construct to the code, so something like:
>
> =============================
> my %visitors;
> my $values;
> $visitors{ $values->[0] }= undef
> while $values= $sth->fetchrow_arrayref;
> printf "Found %d unique visitors\n", scalar keys %visitors;
> =============================
>
> Becomes:
>
> =============================
> my %visitors;
> my $values;
> while ( $sth= $sth->next ) {
> $visitors{ $values->[0] }= undef
> while $values= $sth->fetchrow_arrayref;
> }
> printf "Found %d unique visitors\n", scalar keys %visitors;
> =============================
>
> Then I found out about "swap_inner_handles" and realised that *if* I could
> catch the depletion of the result set, I could do the $sth->next logic using
> swap_inner_handles. And make it completely invisible to the outside world.
>
>
> However, I've not been able to find a way to catch that state change.
>
> I have tried:
>
> Stealing $sth->finish (on the assumption that that would be called when the
> result set depletes)
>
> Stealing DBI::st::STORE (on the assumption that the tie interface would be
> used to reset the Active attribute)
>
> Neither of these work, unfortunately.
>
>
> So I wonder, would it make sense to add a statement handle attribute like
> "last_record_seen" which would contain a code ref? And have that code
> execute with the statement handle as its only parameter whenever the result
> set of a statement handle gets depleted, but before fetchrow_arrayref returns
> undef? If that would exist, then it would simply be a matter of putting the
> swap_inner_handle logic in that code.
>
> something like:
>
> $sth->{last_record_seen}= sub {
> my ($old)= @_;
> $old->swap_inner_handle($new); # wherever $new comes from, not important
> here
> }
Actually, this callback would need to return whether or not to call the
original fetch code again. Something like:
$sth->{last_record_seen}= sub {
my ($old)= @_;
if ($new) { # wherever $new comes from, not important here
$old->swap_inner_handle($new);
return 1; # please try again
}
return 0; # we're really done, thank you
}
Liz