Enhancement suggestions for the Drill JDBC Plugin

Magnus Pierre Wed, 16 Sep 2015 09:44:17 -0700

Hello,

I am not a developer, and don’t aim to become one either, even though I cheat 
by writing code now and then. I do however have lots of ideas for optimizations 
of the recently completed JDBC plugin. (DRILL-3180)
Since JDBC is a single drill-bit operation, it would be good to utilize the 
fact that most databases both store and provide means of accessing individual 
partitions and by that be able to run the query over multiple drillbits.
For instance Oracle have named partitions that can be queried individually:
SELECT * FROM employees PARTITION (p1);

DB2 has partitioning elimination and if you query within the ranges it is a
single or a few partitions that provide the data. By probing metadata you could
turn a select query to be divided across the partitions and by that be able to
extract the information in parallell.

Teradata does not have partitions in the same sense since everything is hash,
but there you could optimize the query execution in other ways:
Teradata does however support something called multi statement requests meaning
that you have one query consisting of many queries separated by ; and when it
executes it will combine all queries into shared steps i.e. one query plan and
making the complete execution more efficient and less costly. (Basically
eliminating lots of spool usage, and table access). Each query will then be
returned as individual result sets, and could therefore be read in parallell

Example code:
https://developer.teradata.com/doc/connectivity/jdbc/reference/current/samp/T20701JD.java.txt

<https://developer.teradata.com/doc/connectivity/jdbc/reference/current/samp/T20701JD.java.txt>

Key point here is that they need to be issued as a multi-statement request
otherwise the optimizations will not take place.
With some simple knowledge of the source table you could then turn a simple
query into a multi-statement of ranges and then run it as a multi-statement, to
get spool elimination as well as parallell read.

Here’s just a few options. Anyone interested to pick this up? I don’t think
there’s one strategy that fits all databases, but it would be very good
enhancements for those databases that do support partitions or functionality
like multi-statements.

Regards,
Magnus

Enhancement suggestions for the Drill JDBC Plugin

Reply via email to