Re: Is it possible to delegate data joins and filtering to the datasource ?

Zelaine Fong Thu, 23 Mar 2017 14:15:17 -0700

The JDBC storage plugin does attempt to do pushdowns of joins.  However, the 
Drill optimizer will evaluate different query plans.  In doing so, it may 
choose an alternative plan that does not do a full pushdown if it believes 
that’s a less costly plan than a full pushdown.  There are a number of open 
bugs with the JDBC storage plugin, including DRILL-4696.  For that particular 
issue, I believe that when it was investigated, it was determined that the 
costing model for the JDBC storage plugin needed more work.  Hence Drill wasn’t 
picking the more optimal full pushdown plan.


-- Zelaine

On 3/23/17, 1:53 PM, "Paul Rogers" <[email protected]> wrote:

    Hi Muhammad,
    
    It seems that the goal for filters should be possible; I’m not familiar 
enough with the code to know if joins are currently supported, or if this is 
where you’d have to make some contributions to Drill.
    
    The storage plugin is called at various places in the planning process, and 
can insert planning rules. We have plugins that push down filters, so this 
seems possible. For example, check Parquet and JDBC for hints. See my answer to 
a previous question for hints on how to get started with storage plugins.
    
    Joins may be a bit more complex. You’d have to insert planner rules; such 
code *may* be available, or may require extensions to Drill. Drill should 
certainly do this, so if the code is not there, we’d welcome your contribution.
    
    You’d have to create an rule that creates a new scan operator that includes 
the information you wish to push down. For example, if you push a filter, the 
scan definition (AKA group scan and scan entry) would need to hold the 
information needed to implement the push-down. Again, you can probably find 
examples of filters, you’d have to be creative to push joins.
    
    Assembling the pieces: your plugin would add planner rules that determine 
when joins can be pushed. Those rules would case your plugin to create a 
semantic node (group scan) that holds the required information. The planner 
then converts group scan nodes to specific plans passed to the execution 
engine. On the execution side, your plugin provides a “Record Reader” for your 
format, and that reader does the actual work to push the filter or join down to 
your data source.
    
    Your best bet is to mine existing plugins for ideas, and then experiment. 
Start simply and gradually add functionality. And, ask questions back on this 
list.
    
    
    Thanks,
    
    - Paul
    
    > On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <[email protected]> wrote:
    > 
    > I'm trying to use Drill with a proprietary datasource that is very fast in
    > applying data joins (i.e. SQL joins) and query filters (i.e. SQL where
    > conditions).
    > 
    > To connect to that datasource, I first have to write a storage plugin, but
    > I'm not sure if my main goal is applicable.
    > 
    > May main goal is to configure Drill to let the datasource perform JOINS 
and
    > filters and only return the data. Then drill can perform further 
processing
    > based on the original SQL query sent to Drill.
    > 
    > Is this possible by developing a storage plugin ? Where exactly should I 
be
    > looking ?
    > 
    > I've been going through this wiki
    > <https://github.com/paul-rogers/drill/wiki> and I don't think I understood
    > every concept. So if there is another source of information about storage
    > plugins development, please point it out.
    > 
    > *---------------------*
    > *Muhammad Gelbana*
    > http://www.linkedin.com/in/mgelbana

Re: Is it possible to delegate data joins and filtering to the datasource ?

Reply via email to