Yes, the RelMdMaxRowCount statistic would be useful for this.  Thanks for
pointing that out.   I'll see if we can leverage that.  The grouping sets
is not yet supported in Drill, but that would be a better solution since it
avoids extra scans.

-Aman

On Wed, Jul 26, 2017 at 11:25 AM, Julian Hyde <jh...@apache.org> wrote:

> Aman,
>
> Thanks for moving dev@calcite to Bcc. This is properly a Drill question.
>
> A blanket restriction on cartesian joins is a blunt instrument. Sometimes
> cartesian joins are valid, safe, and the best plan for a query. This is a
> case in point. Users shouldn’t have to change config parameters to get it
> to work.
>
> (Actually I don’t know the query, but
>
>   select count(distinct deptno), count(distinct gender) from emp
>
> is equivalent.)
>
> Drill should detect that a relational expression can return at most one
> row, and allow a cartesian join if one side is such. Calcite has a
> RelMdMaxRowCount statistic for this. This was added as part of
> http://issues.apache.org/jira/browse/CALCITE-604 <
> http://issues.apache.org/jira/browse/CALCITE-604>. This rule is 100%
> safe. No config parameters required.
>
> Also, Calcite has an alternative way of handling multiple distinct
> aggregates that rewrites to use grouping sets. It doesn’t generate
> self-joins, cartesian or otherwise.  http://issues.apache.org/jira/
> browse/CALCITE-732 <http://issues.apache.org/jira/browse/CALCITE-732>.
>
> Julian
>
>
>
>
>
>
> > On Jul 26, 2017, at 9:20 AM, Aman Sinha <amansi...@apache.org> wrote:
> >
> > [Since this is Drill specific, I put dev@calcite on BCC].
> >
> > If you have two aggregates: Count(distinct a), Count(distinct b), the
> > Calcite logical plan consists of a cartesian join of 2 subqueries each of
> > which first does a group-by on the distinct column followed by a count
> > aggregate.   By default,  Drill only processes cartesian join if one
> input
> > of the join is known to be scalar (single row).  It sounds like after you
> > did the transformation to use the cache, that scalar property somehow did
> > not get propagated.
> > You can override this behavior by a session configuration:  (this will
> use
> > a nested loop join even if inputs are not provably scalar, but it should
> be
> > used for specific query only).    For a more general solution, I believe
> > you may have to create an enhancement JIRA with appropriate details.
> >   'alter session set planner.enable_nljoin_for_scalar_only = false';
> >
> > On Wed, Jul 26, 2017 at 4:14 AM, weijie tong <tongweijie...@gmail.com>
> > wrote:
> >
> >> HI all:
> >>
> >>  I materialize the count distinct query result to a cache, then when
> user
> >> query the count distinct , a specific rule will translate the query to
> the
> >> cache. It turns out right when the query has only one count (distinct )
> >> operator ,but when it has two count (distinct ) ,it causes error .The
> error
> >> info is here:
> >> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269
> >>
> >>
> >> Best Regards.
> >>
>
>

Reply via email to