Re: Why are nested aggregations illegal? Best alternatives?

Gavin Ray Sat, 12 Feb 2022 12:47:01 -0800

Forgive my ignorance/lack of experience

I am somewhat familiar with the ARRAY() function, but not sure I know the
term "correlated"
Searching the Calcite codebase for uses of "correlated" + "query", I found:


https://github.com/apache/calcite/blob/1d4f1b394bfdba03c5538017e12ab2431b137ca9/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L1603-L1612

  @Test void testCorrelatedSubQueryInJoin() {
    final String sql = "select *\n"
        + "from emp as e\n"
        + "join dept as d using (deptno)\n"
        + "where d.name = (\n"
        + "  select max(name)\n"
        + "  from dept as d2\n"
        + "  where d2.deptno = d.deptno)";
    sql(sql).withExpand(false).ok();
  }

But I also see this, which says it is "uncorrelated" but seems very similar?

  @Test void testInUncorrelatedSubQuery() {
    final String sql = "select empno from emp where deptno in"
        + " (select deptno from dept)";
    sql(sql).ok();
  }

I wouldn't blame you for not answering such a basic question -- but what
exactly does "correlation" mean here?

Thanks, as usual Julian


On Sat, Feb 12, 2022 at 3:08 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:

> Correlated ARRAY sub-query?
>
> > On Feb 12, 2022, at 10:40 AM, Gavin Ray <ray.gavi...@gmail.com> wrote:
> >
> > Apologies for the delay in replying
> >
> > This makes things clear and seems obvious now that you point it out.
> > Thank you, Justin and Julian =)
> >
> > Let me ask another question (if I may) that I am struggling to phrase
> > easily.
> >
> > So with GraphQL, you might have a query like:
> > - "Get houses"
> > - "For each house get the user that lives in the house
> > - "And for each user get their list of todos"
> >
> > The result has to come back such that it's a single object for each row
> > ==============
> > {
> >  houses: [{
> >    address: "123 Main Street",
> >    users: [{
> >      name: "Joe",
> >      todos: [{
> >        description: "Take out trash"
> >      }]
> >    }]
> > }
> >
> > From a SQL perspective, the logical equivalent would be something like:
> > ==============
> > SELECT
> >    house.address,
> >    (somehow nest users + double-nest todos under user)
> > FROM
> >    house
> > JOIN
> >    user ON user.house_id = house.id
> >    todos ON todos.user_id = user.id
> > WHERE
> >    house.id = 1
> >
> > I'm not familiar enough with SQL to have figured out a way to make this
> > kind of
> > query using operators that are supported across most of the DB's Calcite
> has
> > adapters for.
> >
> > Currently what I have done instead, on a tip from Gopalakrishna Holla
> from
> > LinkedIn Coral team who has built GraphQL-on-Calcite, was to break up the
> > query
> > into individual parts and then do the join in-memory:
> >
> > SELECT ... FROM users;
> > SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from prev
> response);
> >
> > However, the way I am doing this seems like it's probably very
> inefficient.
> > Because I do a series of nested loops to add the key to each object in
> the
> > parent ResultSet row:
> >
> >
> https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153
> >
> > Is there some better way of doing this?
> > I would be eternally grateful for any advice.
> >
> >
> > On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde <jhyde.apa...@gmail.com>
> wrote:
> >
> >> Yes, if you want to do multiple layers of aggregation, use CTEs (WITH)
> or
> >> nested sub-queries. For example, the following is I believe valid
> standard
> >> SQL, and actually computes something useful:
> >>
> >>  WITH q1 AS
> >>   (SELECT deptno, job, AVG(sal) AS avg_sal
> >>    FROM emp
> >>    GROUP BY deptno, job)
> >>  WITH q2 AS
> >>   (SELECT deptno, AVG(avg_sal) AS avg_avg_sal
> >>    FROM q1
> >>    GROUP BY deptno)
> >>  SELECT AVG(avg_avg_sal)
> >>  FROM q2
> >>  GROUP BY ()
> >>
> >> (You can omit the “GROUP BY ()” line, but I think it makes things
> clearer.)
> >>
> >> Julian
> >>
> >>
> >>
> >>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart <greenl...@gmail.com>
> >> wrote:
> >>>
> >>> I wish you could unsend emails :)  Answering my own question, no,
> because
> >>> that would return three rows with the average :D
> >>>
> >>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart <greenl...@gmail.com>
> >> wrote:
> >>>
> >>>> Just out of curiosity, is the second level aggregation using AVG in a
> >>>> window context?  It the frame is the whole table and it aggregates
> over
> >> it?
> >>>>
> >>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart <greenl...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> That is really neat about Oracle.
> >>>>>
> >>>>> The alternative in general is to use a subquery:
> >>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
> >>>>> becomes
> >>>>> select avg(the_avg)
> >>>>> from (select avg(sal) from emp group b deptno) an_alias;
> >>>>>
> >>>>> or
> >>>>>
> >>>>> with the_cte as (select avg(sal) x from emp group by deptno)
> >>>>> select avg(x) from the_cte;
> >>>>>
> >>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde <jhyde.apa...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Some databases, e.g. Oracle, allow TWO levels of nesting:
> >>>>>>
> >>>>>> SELECT avg(sal) FROM emp GROUP BY deptno;
> >>>>>>
> >>>>>> AVG(SAL)
> >>>>>> ========
> >>>>>> 1,566.67
> >>>>>> 2,175.00
> >>>>>> 2,916.65
> >>>>>>
> >>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
> >>>>>>
> >>>>>> AVG(SUM(SAL))
> >>>>>> =============
> >>>>>>       9,675
> >>>>>>
> >>>>>> The first level aggregates by department (returning 3 records), and
> >> the
> >>>>>> second level computes the grand total (returning 1 record). But that
> >> is an
> >>>>>> exceptional case.
> >>>>>>
> >>>>>> Generally, any expression in the SELECT or HAVING clause of an
> >> aggregate
> >>>>>> query is either ‘before’ or ‘after’ aggregation. Consider
> >>>>>>
> >>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b
> >>>>>> FROM t
> >>>>>> GROUP BY t.x
> >>>>>>
> >>>>>> The expressions “t.y” and “t.y + 3” occur before aggregation; “t.x”,
> >>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur after
> >> aggregation.
> >>>>>> SQL semantics rely heavily on this stratification. Allowing an extra
> >> level
> >>>>>> of aggregation would mess it all up.
> >>>>>>
> >>>>>> Julian
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart <greenl...@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> This is a SQL limitation.
> >>>>>>>
> >>>>>>> mysql> select sum(1);
> >>>>>>> +--------+
> >>>>>>> | sum(1) |
> >>>>>>> +--------+
> >>>>>>> |      1 |
> >>>>>>> +--------+
> >>>>>>> 1 row in set (0.00 sec)
> >>>>>>>
> >>>>>>> mysql> select sum(sum(1));
> >>>>>>> ERROR 1111 (HY000): Invalid use of group function
> >>>>>>>
> >>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray <ray.gavi...@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Went to test this query out and found that it can't be performed:
> >>>>>>>>
> >>>>>>>> SELECT
> >>>>>>>>  JSON_OBJECT(
> >>>>>>>>      KEY 'users'
> >>>>>>>>      VALUE JSON_ARRAYAGG(
> >>>>>>>>          JSON_OBJECT(
> >>>>>>>>              KEY 'name' VALUE "users"."name",
> >>>>>>>>              KEY 'todos' VALUE JSON_ARRAYAGG(
> >>>>>>>>                  JSON_OBJECT(
> >>>>>>>>                      KEY 'description' VALUE "todos"."description"
> >>>>>>>>                  )
> >>>>>>>>              )
> >>>>>>>>          )
> >>>>>>>>      )
> >>>>>>>>  )
> >>>>>>>> FROM
> >>>>>>>>  "users"
> >>>>>>>> LEFT OUTER JOIN
> >>>>>>>>  "todos" ON "users"."id" = "todos"."user_id";
> >>>>>>>>
> >>>>>>>> Checking the source, seems this is a blanket policy, not a
> >>>>>>>> datasource-specific thing.
> >>>>>>>> From a functional perspective, it doesn't feel like it's much
> >>>>>> different
> >>>>>>>> from JOINs
> >>>>>>>> But I don't understand relational theory or DB functionality in
> the
> >>>>>> least,
> >>>>>>>> so I'm not fit to judge.
> >>>>>>>>
> >>>>>>>> Just curious why Calcite doesn't allow this
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>
> >>
>
>

Re: Why are nested aggregations illegal? Best alternatives?

Reply via email to