Re: Is it possible to delegate data joins and filtering to the datasource ?

Muhammad Gelbana Sat, 25 Mar 2017 17:38:04 -0700

Priceless information ! Thank you all.

I managed to debug Drill in Eclipse hoping to get a better understanding
but I can't get my head around some stuff:


   - What is the purpose of these clases\interfaces:
      - ConverterRule
      - DrillRel
      - Prel
      - JdbcStoragePlugin.JdbcPrule
      - JdbcIntermediatePrel
   - What does the words *Prel* and *Prule* stand for ? *Prel*iminary and
   *P*reliminary *Rule* ?
   - What is a calling convention ? (i.e. mentioned in *ConverterRule*'s
   documentation)

Is there a way configure the costing model for the JDBC plugin without
having to customize it through code ? After all, my ultimate goal is to
push down filters and joins.

I'll continue debugging\browsing the code and come back with more
questions, or hopefully an achievement !

Thanks again, your help is very much appreciated.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Fri, Mar 24, 2017 at 1:29 AM, weijie tong <[email protected]>
wrote:

> I am working on pushing down joins to Druid storage plugin. To my
> experience, you need to write a rule to know whether the joins could be
> pushed down by your storage plugin metadata first,then if ok ,you transfer
> the join node to the scan node with the query relevant information in the
> scan node. The key point is to do this rule in the HepPlanner.
> Zelaine Fong <[email protected]>于2017年3月24日 周五上午5:15写道：
>
> > The JDBC storage plugin does attempt to do pushdowns of joins.  However,
> > the Drill optimizer will evaluate different query plans.  In doing so, it
> > may choose an alternative plan that does not do a full pushdown if it
> > believes that’s a less costly plan than a full pushdown.  There are a
> > number of open bugs with the JDBC storage plugin, including DRILL-4696.
> > For that particular issue, I believe that when it was investigated, it
> was
> > determined that the costing model for the JDBC storage plugin needed more
> > work.  Hence Drill wasn’t picking the more optimal full pushdown plan.
> >
> > -- Zelaine
> >
> > On 3/23/17, 1:53 PM, "Paul Rogers" <[email protected]> wrote:
> >
> >     Hi Muhammad,
> >
> >     It seems that the goal for filters should be possible; I’m not
> > familiar enough with the code to know if joins are currently supported,
> or
> > if this is where you’d have to make some contributions to Drill.
> >
> >     The storage plugin is called at various places in the planning
> > process, and can insert planning rules. We have plugins that push down
> > filters, so this seems possible. For example, check Parquet and JDBC for
> > hints. See my answer to a previous question for hints on how to get
> started
> > with storage plugins.
> >
> >     Joins may be a bit more complex. You’d have to insert planner rules;
> > such code *may* be available, or may require extensions to Drill. Drill
> > should certainly do this, so if the code is not there, we’d welcome your
> > contribution.
> >
> >     You’d have to create an rule that creates a new scan operator that
> > includes the information you wish to push down. For example, if you push
> a
> > filter, the scan definition (AKA group scan and scan entry) would need to
> > hold the information needed to implement the push-down. Again, you can
> > probably find examples of filters, you’d have to be creative to push
> joins.
> >
> >     Assembling the pieces: your plugin would add planner rules that
> > determine when joins can be pushed. Those rules would case your plugin to
> > create a semantic node (group scan) that holds the required information.
> > The planner then converts group scan nodes to specific plans passed to
> the
> > execution engine. On the execution side, your plugin provides a “Record
> > Reader” for your format, and that reader does the actual work to push the
> > filter or join down to your data source.
> >
> >     Your best bet is to mine existing plugins for ideas, and then
> > experiment. Start simply and gradually add functionality. And, ask
> > questions back on this list.
> >
> >
> >     Thanks,
> >
> >     - Paul
> >
> >     > On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <[email protected]
> >
> > wrote:
> >     >
> >     > I'm trying to use Drill with a proprietary datasource that is very
> > fast in
> >     > applying data joins (i.e. SQL joins) and query filters (i.e. SQL
> > where
> >     > conditions).
> >     >
> >     > To connect to that datasource, I first have to write a storage
> > plugin, but
> >     > I'm not sure if my main goal is applicable.
> >     >
> >     > May main goal is to configure Drill to let the datasource perform
> > JOINS and
> >     > filters and only return the data. Then drill can perform further
> > processing
> >     > based on the original SQL query sent to Drill.
> >     >
> >     > Is this possible by developing a storage plugin ? Where exactly
> > should I be
> >     > looking ?
> >     >
> >     > I've been going through this wiki
> >     > <https://github.com/paul-rogers/drill/wiki> and I don't think I
> > understood
> >     > every concept. So if there is another source of information about
> > storage
> >     > plugins development, please point it out.
> >     >
> >     > *---------------------*
> >     > *Muhammad Gelbana*
> >     > http://www.linkedin.com/in/mgelbana
> >
> >
> >
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Reply via email to