Priceless information ! Thank you all.
I managed to debug Drill in Eclipse hoping to get a better understanding
but I can't get my head around some stuff:
- What is the purpose of these clases\interfaces:
- ConverterRule
- DrillRel
- Prel
- JdbcStoragePlugin.JdbcPrule
- JdbcIntermediatePrel
- What does the words *Prel* and *Prule* stand for ? *Prel*iminary and
*P*reliminary *Rule* ?
- What is a calling convention ? (i.e. mentioned in *ConverterRule*'s
documentation)
Is there a way configure the costing model for the JDBC plugin without
having to customize it through code ? After all, my ultimate goal is to
push down filters and joins.
I'll continue debugging\browsing the code and come back with more
questions, or hopefully an achievement !
Thanks again, your help is very much appreciated.
*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana
On Fri, Mar 24, 2017 at 1:29 AM, weijie tong <[email protected]>
wrote:
> I am working on pushing down joins to Druid storage plugin. To my
> experience, you need to write a rule to know whether the joins could be
> pushed down by your storage plugin metadata first,then if ok ,you transfer
> the join node to the scan node with the query relevant information in the
> scan node. The key point is to do this rule in the HepPlanner.
> Zelaine Fong <[email protected]>于2017年3月24日 周五上午5:15写道:
>
> > The JDBC storage plugin does attempt to do pushdowns of joins. However,
> > the Drill optimizer will evaluate different query plans. In doing so, it
> > may choose an alternative plan that does not do a full pushdown if it
> > believes that’s a less costly plan than a full pushdown. There are a
> > number of open bugs with the JDBC storage plugin, including DRILL-4696.
> > For that particular issue, I believe that when it was investigated, it
> was
> > determined that the costing model for the JDBC storage plugin needed more
> > work. Hence Drill wasn’t picking the more optimal full pushdown plan.
> >
> > -- Zelaine
> >
> > On 3/23/17, 1:53 PM, "Paul Rogers" <[email protected]> wrote:
> >
> > Hi Muhammad,
> >
> > It seems that the goal for filters should be possible; I’m not
> > familiar enough with the code to know if joins are currently supported,
> or
> > if this is where you’d have to make some contributions to Drill.
> >
> > The storage plugin is called at various places in the planning
> > process, and can insert planning rules. We have plugins that push down
> > filters, so this seems possible. For example, check Parquet and JDBC for
> > hints. See my answer to a previous question for hints on how to get
> started
> > with storage plugins.
> >
> > Joins may be a bit more complex. You’d have to insert planner rules;
> > such code *may* be available, or may require extensions to Drill. Drill
> > should certainly do this, so if the code is not there, we’d welcome your
> > contribution.
> >
> > You’d have to create an rule that creates a new scan operator that
> > includes the information you wish to push down. For example, if you push
> a
> > filter, the scan definition (AKA group scan and scan entry) would need to
> > hold the information needed to implement the push-down. Again, you can
> > probably find examples of filters, you’d have to be creative to push
> joins.
> >
> > Assembling the pieces: your plugin would add planner rules that
> > determine when joins can be pushed. Those rules would case your plugin to
> > create a semantic node (group scan) that holds the required information.
> > The planner then converts group scan nodes to specific plans passed to
> the
> > execution engine. On the execution side, your plugin provides a “Record
> > Reader” for your format, and that reader does the actual work to push the
> > filter or join down to your data source.
> >
> > Your best bet is to mine existing plugins for ideas, and then
> > experiment. Start simply and gradually add functionality. And, ask
> > questions back on this list.
> >
> >
> > Thanks,
> >
> > - Paul
> >
> > > On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <[email protected]
> >
> > wrote:
> > >
> > > I'm trying to use Drill with a proprietary datasource that is very
> > fast in
> > > applying data joins (i.e. SQL joins) and query filters (i.e. SQL
> > where
> > > conditions).
> > >
> > > To connect to that datasource, I first have to write a storage
> > plugin, but
> > > I'm not sure if my main goal is applicable.
> > >
> > > May main goal is to configure Drill to let the datasource perform
> > JOINS and
> > > filters and only return the data. Then drill can perform further
> > processing
> > > based on the original SQL query sent to Drill.
> > >
> > > Is this possible by developing a storage plugin ? Where exactly
> > should I be
> > > looking ?
> > >
> > > I've been going through this wiki
> > > <https://github.com/paul-rogers/drill/wiki> and I don't think I
> > understood
> > > every concept. So if there is another source of information about
> > storage
> > > plugins development, please point it out.
> > >
> > > *---------------------*
> > > *Muhammad Gelbana*
> > > http://www.linkedin.com/in/mgelbana
> >
> >
> >
> >
>