​I believe ​clhubert is referring to this discussion
<http://drill-user.incubator.apache.narkive.com/TIXWiTY4/cartesian-product-in-apache-drill#post1>
.

So why Drill doesn't transform this query into a nested join query ? Simply
because there is no Calcite rule to transform it into a nested loop join ?
Is it not technically possible to write such Rule or is it feasible so I
may take on this challenge ?

Also pardon me for repeating my question but I fail to find an answer in
your replies, why doesn't Drill just run a cartesian join ? Because it's
expensive regarding resources (i.e. CPU\Network\RAM) ?

Thanks a lot Shadi for the query, it works for me.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Mon, May 8, 2017 at 6:10 AM, Shadi Khalifa <[email protected]> wrote:

> Hi Muhammad,
>
> I did the following as a workaround to have Cartesian product. The basic
> idea is to create a dummy column on the fly that has the value 1 in both
> tables and then join on that column leading to having a match of every row
> of the first table with every row of the second table, hence do a Cartesian
> product. This might not be the most efficient way but it will do the job.
>
> *Original Query:*
> SELECT * FROM
> ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT
> 2147483647) `t0`
> INNER JOIN
> ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT
> 2147483647) `t1`
> ON (`t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`)
> LIMIT 2147483647
>
> *Workaround (add columns **d1a381f3g73 and **d1a381f3g74 to tables one
> and two, respectively. Names don't really matter, just need to be unique):*
> SELECT * FROM
> ( SELECT *1 as d1a381f3g73*, 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t0`
> INNER JOIN
> ( SELECT *1 as d1a381f3g74*, 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1`
> ON (`t0`.*d1a381f3g73 = *`t1`.*d1a381f3g74*)
> WHERE `t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`
> LIMIT 2147483647
>
> Regards
>
>
> *Shadi Khalifa, PhD*
> Postdoctoral Fellow
> Cognitive Analytics Development Hub
> Centre for Advanced Computing
> Queen’s University
> (613) 533-6000 x78347
> http://cac.queensu.ca
>
> I'm just a neuron in the society collective brain
>
> *Join us for HPCS in June 2017! Register at:*  *http://2017.hpcs.ca/
> <http://2017.hpcs.ca/>*
>
> P Please consider your environmental responsibility before printing this
> e-mail
>
> *01001001 00100000 01101100 01101111 01110110 01100101 00100000 01000101
> 01100111 01111001 01110000 01110100 *
>
> *The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential material. Any review or
> dissemination of this information by persons other than the intended
> recipient is prohibited. If you received this in error, please contact the
> sender and delete the material from any computer. Thank you.*
>
>
>
> On Saturday, May 6, 2017 6:05 PM, Muhammad Gelbana <[email protected]>
> wrote:
>
>
> ​​
> Here it is:
>
> SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc
> LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON (
> ​​
> `t0`.`UserID` IS NOT DISTINCT FROM
> ​​
> `t1`.`UserID`) LIMIT 2147483647
>
> I debugged Drill code and found it decomposes *IS NOT DISTINCT FROM* into
> ​
> *`t0`.`UserID` = ​`t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID`
> IS NULL**)* while checking if the query is a cartesian join, and when the
> check returns true, it throws an excetion saying: *This query cannot be
> planned possibly due to either a cartesian join or an inequality join*
>
>
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Sat, May 6, 2017 at 6:53 PM, Gautam Parai <[email protected]> wrote:
>
> > Can you please specify the query you are trying to execute?
> >
> >
> > Gautam
> >
> > ________________________________
> > From: Muhammad Gelbana <[email protected]>
> > Sent: Saturday, May 6, 2017 7:34:53 AM
> > To: [email protected]; [email protected]
> > Subject: Running cartesian joins on Drill
> >
> > Is there a reason why Drill would intentionally reject cartesian join
> > queries even if *planner.enable_nljoin_for_scalar_only* is disabled ?
> >
> > Any ideas how could a query be rewritten to overcome this restriction ?
> >
> > *---------------------*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
>
>
>

Reply via email to