Hi Muhammad,

It seems that the goal for filters should be possible; I’m not familiar enough 
with the code to know if joins are currently supported, or if this is where 
you’d have to make some contributions to Drill.

The storage plugin is called at various places in the planning process, and can 
insert planning rules. We have plugins that push down filters, so this seems 
possible. For example, check Parquet and JDBC for hints. See my answer to a 
previous question for hints on how to get started with storage plugins.

Joins may be a bit more complex. You’d have to insert planner rules; such code 
*may* be available, or may require extensions to Drill. Drill should certainly 
do this, so if the code is not there, we’d welcome your contribution.

You’d have to create an rule that creates a new scan operator that includes the 
information you wish to push down. For example, if you push a filter, the scan 
definition (AKA group scan and scan entry) would need to hold the information 
needed to implement the push-down. Again, you can probably find examples of 
filters, you’d have to be creative to push joins.

Assembling the pieces: your plugin would add planner rules that determine when 
joins can be pushed. Those rules would case your plugin to create a semantic 
node (group scan) that holds the required information. The planner then 
converts group scan nodes to specific plans passed to the execution engine. On 
the execution side, your plugin provides a “Record Reader” for your format, and 
that reader does the actual work to push the filter or join down to your data 
source.

Your best bet is to mine existing plugins for ideas, and then experiment. Start 
simply and gradually add functionality. And, ask questions back on this list.


Thanks,

- Paul

> On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <m.gelb...@gmail.com> wrote:
> 
> I'm trying to use Drill with a proprietary datasource that is very fast in
> applying data joins (i.e. SQL joins) and query filters (i.e. SQL where
> conditions).
> 
> To connect to that datasource, I first have to write a storage plugin, but
> I'm not sure if my main goal is applicable.
> 
> May main goal is to configure Drill to let the datasource perform JOINS and
> filters and only return the data. Then drill can perform further processing
> based on the original SQL query sent to Drill.
> 
> Is this possible by developing a storage plugin ? Where exactly should I be
> looking ?
> 
> I've been going through this wiki
> <https://github.com/paul-rogers/drill/wiki> and I don't think I understood
> every concept. So if there is another source of information about storage
> plugins development, please point it out.
> 
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana

Reply via email to