Hello,

I am not a developer, and don’t aim to become one either, even though I cheat 
by writing code now and then. I do however have lots of ideas for optimizations 
of the recently completed JDBC plugin. (DRILL-3180)
Since JDBC is a single drill-bit operation, it would be good to utilize the 
fact that most databases both store and provide means of accessing individual 
partitions and by that be able to run the query over multiple drillbits.
For instance Oracle have named partitions that can be queried individually:
SELECT * FROM employees PARTITION (p1);

DB2 has partitioning elimination and if you query within the ranges it is a 
single or a few partitions that provide the data. By probing metadata you could 
turn a select query to be divided across the partitions and by that be able to 
extract the information in parallell.

Teradata does not have partitions in the same sense since everything is hash, 
but there you could optimize the query execution in other ways: 
Teradata does however support something called multi statement requests meaning 
that you have one query consisting of many queries separated by ;  and when it 
executes it will combine all queries into shared steps i.e. one  query plan and 
making the complete execution more efficient and less costly. (Basically 
eliminating lots of spool usage, and table access). Each query will then be 
returned as individual result sets, and could therefore be read in parallell

Example code:
https://developer.teradata.com/doc/connectivity/jdbc/reference/current/samp/T20701JD.java.txt
 
<https://developer.teradata.com/doc/connectivity/jdbc/reference/current/samp/T20701JD.java.txt>

Key point here is that they need to be issued as a multi-statement request 
otherwise the optimizations will not take place.
With some simple knowledge of the source table you could then turn a simple 
query into a multi-statement of ranges and then run it as a multi-statement, to 
get spool elimination as well as parallell read.

Here’s just a few options. Anyone interested to pick this up? I don’t think 
there’s one strategy that fits all databases, but it would be very good 
enhancements for those databases that do support partitions or functionality 
like multi-statements.

Regards,
Magnus




Reply via email to