I'm focusing on JOINs now, specially a query such as this: *SELECT * FROM
TABLE1, TABLE2*, drill plans to transform this into 2 separate full scan
queries and then perform the cartesian product join on it's own. I'm trying
to make drill send the query as it is in a single scan (group scan ?)

@weijie

I've found that if I opt-out the JDBC's JdbcDrelConverterRule rule (i.e.
JdbcStoragePlugin.DrillJdbcConvention.DrillJdbcConvention), an exception is
thrown because Drill refuses to plan cartesian product joins. Are you
saying that I need to keep such rule and let Drill plan it to 2 different
group scans, then I should change this plan to merge these 2 group scans
into one ?

Is there a way to make Drill accept planning cartesian product joins ?

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Sun, Mar 26, 2017 at 1:33 AM, Muhammad Gelbana <m.gelb...@gmail.com>
wrote:

> Priceless information ! Thank you all.
>
> I managed to debug Drill in Eclipse hoping to get a better understanding
> but I can't get my head around some stuff:
>
>    - What is the purpose of these clases\interfaces:
>       - ConverterRule
>       - DrillRel
>       - Prel
>       - JdbcStoragePlugin.JdbcPrule
>       - JdbcIntermediatePrel
>    - What does the words *Prel* and *Prule* stand for ? *Prel*iminary and
>    *P*reliminary *Rule* ?
>    - What is a calling convention ? (i.e. mentioned in *ConverterRule*'s
>    documentation)
>
> Is there a way configure the costing model for the JDBC plugin without
> having to customize it through code ? After all, my ultimate goal is to
> push down filters and joins.
>
> I'll continue debugging\browsing the code and come back with more
> questions, or hopefully an achievement !
>
> Thanks again, your help is very much appreciated.
>
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Fri, Mar 24, 2017 at 1:29 AM, weijie tong <tongweijie...@gmail.com>
> wrote:
>
>> I am working on pushing down joins to Druid storage plugin. To my
>> experience, you need to write a rule to know whether the joins could be
>> pushed down by your storage plugin metadata first,then if ok ,you transfer
>> the join node to the scan node with the query relevant information in the
>> scan node. The key point is to do this rule in the HepPlanner.
>> Zelaine Fong <zf...@mapr.com>于2017年3月24日 周五上午5:15写道:
>>
>> > The JDBC storage plugin does attempt to do pushdowns of joins.  However,
>> > the Drill optimizer will evaluate different query plans.  In doing so,
>> it
>> > may choose an alternative plan that does not do a full pushdown if it
>> > believes that’s a less costly plan than a full pushdown.  There are a
>> > number of open bugs with the JDBC storage plugin, including DRILL-4696.
>> > For that particular issue, I believe that when it was investigated, it
>> was
>> > determined that the costing model for the JDBC storage plugin needed
>> more
>> > work.  Hence Drill wasn’t picking the more optimal full pushdown plan.
>> >
>> > -- Zelaine
>> >
>> > On 3/23/17, 1:53 PM, "Paul Rogers" <prog...@mapr.com> wrote:
>> >
>> >     Hi Muhammad,
>> >
>> >     It seems that the goal for filters should be possible; I’m not
>> > familiar enough with the code to know if joins are currently supported,
>> or
>> > if this is where you’d have to make some contributions to Drill.
>> >
>> >     The storage plugin is called at various places in the planning
>> > process, and can insert planning rules. We have plugins that push down
>> > filters, so this seems possible. For example, check Parquet and JDBC for
>> > hints. See my answer to a previous question for hints on how to get
>> started
>> > with storage plugins.
>> >
>> >     Joins may be a bit more complex. You’d have to insert planner rules;
>> > such code *may* be available, or may require extensions to Drill. Drill
>> > should certainly do this, so if the code is not there, we’d welcome your
>> > contribution.
>> >
>> >     You’d have to create an rule that creates a new scan operator that
>> > includes the information you wish to push down. For example, if you
>> push a
>> > filter, the scan definition (AKA group scan and scan entry) would need
>> to
>> > hold the information needed to implement the push-down. Again, you can
>> > probably find examples of filters, you’d have to be creative to push
>> joins.
>> >
>> >     Assembling the pieces: your plugin would add planner rules that
>> > determine when joins can be pushed. Those rules would case your plugin
>> to
>> > create a semantic node (group scan) that holds the required information.
>> > The planner then converts group scan nodes to specific plans passed to
>> the
>> > execution engine. On the execution side, your plugin provides a “Record
>> > Reader” for your format, and that reader does the actual work to push
>> the
>> > filter or join down to your data source.
>> >
>> >     Your best bet is to mine existing plugins for ideas, and then
>> > experiment. Start simply and gradually add functionality. And, ask
>> > questions back on this list.
>> >
>> >
>> >     Thanks,
>> >
>> >     - Paul
>> >
>> >     > On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <
>> m.gelb...@gmail.com>
>> > wrote:
>> >     >
>> >     > I'm trying to use Drill with a proprietary datasource that is very
>> > fast in
>> >     > applying data joins (i.e. SQL joins) and query filters (i.e. SQL
>> > where
>> >     > conditions).
>> >     >
>> >     > To connect to that datasource, I first have to write a storage
>> > plugin, but
>> >     > I'm not sure if my main goal is applicable.
>> >     >
>> >     > May main goal is to configure Drill to let the datasource perform
>> > JOINS and
>> >     > filters and only return the data. Then drill can perform further
>> > processing
>> >     > based on the original SQL query sent to Drill.
>> >     >
>> >     > Is this possible by developing a storage plugin ? Where exactly
>> > should I be
>> >     > looking ?
>> >     >
>> >     > I've been going through this wiki
>> >     > <https://github.com/paul-rogers/drill/wiki> and I don't think I
>> > understood
>> >     > every concept. So if there is another source of information about
>> > storage
>> >     > plugins development, please point it out.
>> >     >
>> >     > *---------------------*
>> >     > *Muhammad Gelbana*
>> >     > http://www.linkedin.com/in/mgelbana
>> >
>> >
>> >
>> >
>>
>
>

Reply via email to