Re: cross database queries?

Jens Rehsack Wed, 26 Jun 2013 07:14:01 -0700

On 26.06.13 15:25, David Nicol wrote:

On Wed, Jun 26, 2013 at 7:26 AM, Andrew Snyder <[email protected]
<mailto:[email protected]>> wrote:


    I want to write a query like:

    select clients.client.client_id, columnar.sales.total_sales,
    web.page_hits from clients, columnar, web
    where clients.client_id = columnar.client_id
    and  clients.client_id = web.client_id

    in a system where 'clients' is actually one or more relational
    databases, 'columnar' is one or columnar databases, and 'web' is the
    Apache logs on one or more web servers.  The dbi driver would be
    configured to connect to the correct databases and filter web hits
    based on 'client_id'.

    Has somebody written that already?

    Thanks,
    Andrew



it seems like the right thing to do here would be to do three queries,
against the three data sources, and store all the results in a hash of
arrays, then dump the results. Any solution that automates it will wind
up doing at least that anyway, and might not be optimized for the join.


S::S is not bad in joining tables - it's bad in optimizing queries.
I cannot fetch the smalles table first or first query those tables which
have constants in where clauses. So bad queries might cause gigabytes
on memory are wasted for resulting 50 lines.

That is what I meant by "not optimized" :)

Unless there really are so many client IDs that you need to process the
results as a stream or run out of memory, which is unlikely.


     while (my ($c_id, $ar) = each %resultz){
           $ar->[0] or next;   # filter out client_id not appearing in
clients database
           print join( "\t", $c_id, 0+$ar->[1], 0+$ar->[2]),"\n";
     }


Well - even if not optimized, the implementation of SQL::Statement is
even better. And the datasources for S::S are easy to write - it
finally requires an open_table and fetch_row method.

Two parallel hashes containing the web and columnar results, accessed
once for each result from querying the clients table, would also work.


Yes, that's something S::S can't do out of the box. But it could do, I
have a version of DBD::Sys where during open_table() the query is send
against the data-sources and first fetch_row() synchronized the results
of the appropriate queries.

Cheers
--
Jens Rehsack

Re: cross database queries?

Reply via email to