Re: Why are nested aggregations illegal? Best alternatives?

Gavin Ray Wed, 23 Feb 2022 15:47:56 -0800

Well actually, there is one exception -- there's a small bug with the
output being improperly escaped but that bug is already on JIRA and seems
much easier to approach.


https://issues.apache.org/jira/browse/CALCITE-4989

On Wed, Feb 23, 2022 at 6:46 PM Gavin Ray <ray.gavi...@gmail.com> wrote:

> Oh wow, that fixed it!
>
> I changed that line in the parser to be Expression(ExprContext.ACCEPT_ALL)
> And now my nested JSON queries work!
>
> The minimal reproduction is:
>
> SELECT JSON_OBJECT(
>     KEY 'id'
>     VALUE (SELECT 1)
> )
>
> +----------+
> |  EXPR$0  |
> +----------+
> | {"id":1} |
> +----------+
> 1 row selected (0.01 seconds)
>
> On Wed, Feb 23, 2022 at 6:30 PM Gavin Ray <ray.gavi...@gmail.com> wrote:
>
>> Okay, I think maybe I have an idea what is going on
>>
>> The parser grammar for "JsonNameAndValue":
>>
>> https://github.com/apache/calcite/blob/82dd78a14f6aef2eeec2f9c94978d04b4acc5359/core/src/main/codegen/templates/Parser.jj#L6267-L6296
>>
>> Has:
>> e = Expression(ExprContext.ACCEPT_NON_QUERY)
>>
>> This enum value appears to be the one that says it can't contain queries:
>>
>> https://github.com/apache/calcite/blob/82dd78a14f6aef2eeec2f9c94978d04b4acc5359/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L328-L331
>>
>>     /**
>>      * Accept only non-query expressions in this context.
>>      */
>>     ACCEPT_NON_QUERY,
>>
>> So it's not a bug, but Calcite's parser just doesn't take query
>> expressions for JSON values.
>>
>> I guess that changing this might break a bunch of stuff, and probably is
>> very difficult?
>>
>> On Wed, Feb 23, 2022 at 6:10 PM Gavin Ray <ray.gavi...@gmail.com> wrote:
>>
>>> Sorry, I realized this might be a time investment to try to
>>> debug/reproduce
>>> So I found a website that let me create the schema, rows, and query on
>>> various DB's
>>>
>>> Here is an example on Oracle and MySQL that you can view/modify in your
>>> browser:
>>>
>>> ORACLE:
>>> https://dbfiddle.uk/?rdbms=oracle_21&fiddle=2eed69c44fa63adf9830213163ba73d0
>>> MYSQL:
>>> https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=f94c7957eae4f5ebe4c879c17fbe64ea
>>>
>>> On Wed, Feb 23, 2022 at 2:59 PM Gavin Ray <ray.gavi...@gmail.com> wrote:
>>>
>>>> I think this actually is a bug in Calcite's parser or it's
>>>> interpretation
>>>> I tested on H2, Oracle, and MySQL, the below is valid on those DB's but
>>>> fails on Calcite's parser:
>>>>
>>>> =================
>>>> H2/Oracle
>>>> =================
>>>> SELECT
>>>>     "houses"."id" AS "id",
>>>>     "houses"."address" AS "address",
>>>>     (
>>>>         SELECT
>>>>             JSON_ARRAYAGG(JSON_OBJECT(
>>>>                 KEY 'id' VALUE "users"."id",
>>>>                 KEY 'name' VALUE "users"."name",
>>>>                 KEY 'todos' VALUE (
>>>>                     SELECT
>>>>                         JSON_ARRAYAGG(JSON_OBJECT(
>>>>                             KEY 'id' VALUE "todos"."id",
>>>>                             KEY 'description' VALUE
>>>> "todos"."description"
>>>>                         ))
>>>>                     FROM
>>>>                         "todos"
>>>>                     WHERE
>>>>                         "todos"."user_id" = "users"."id"
>>>>                 )
>>>>             ))
>>>>         FROM
>>>>             "users"
>>>>         WHERE
>>>>             "users"."house_id" = "houses"."id"
>>>>     ) "users"
>>>> FROM
>>>>     "houses"
>>>> GROUP BY
>>>>     "houses"."id",
>>>>     "houses"."address"
>>>>
>>>> =================
>>>> MySQL
>>>> =================
>>>> SELECT
>>>>     `houses`.`id` AS `id`,
>>>>     `houses`.`address` AS `address`,
>>>>     (
>>>>         SELECT
>>>>             JSON_ARRAYAGG(JSON_OBJECT(
>>>>                 'id', `users`.`id`,
>>>>                 'name', `users`.`name`,
>>>>                 'todos', (
>>>>                     SELECT
>>>>                         JSON_ARRAYAGG(JSON_OBJECT(
>>>>                             'id', `todos`.`id`,
>>>>                             'description', `todos`.`description`
>>>>                         ))
>>>>                     FROM
>>>>                         `todos`
>>>>                     WHERE
>>>>                         `todos`.`user_id` = `users`.`id`
>>>>                 )
>>>>             ))
>>>>         FROM
>>>>             `users`
>>>>         WHERE
>>>>             `users`.`house_id` = `houses`.`id`
>>>>     ) `users`
>>>> FROM
>>>>     `houses`
>>>> GROUP BY
>>>>     `houses`.`id`,
>>>>     `houses`.`address`
>>>>
>>>> On Wed, Feb 23, 2022 at 12:52 PM Gavin Ray <ray.gavi...@gmail.com>
>>>> wrote:
>>>>
>>>>> That's a valid point, let me see check what some common dialects will
>>>>> accept for this sort of thing
>>>>>
>>>>> On Wed, Feb 23, 2022 at 12:36 PM Julian Hyde <jhyde.apa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> It’s a parser error. That usually means that the user has made a
>>>>>> mistake.
>>>>>>
>>>>>> Try your SQL on another DB with JSON support before you declare this
>>>>>> a Calcite bug.
>>>>>>
>>>>>> Julian
>>>>>>
>>>>>> > On Feb 23, 2022, at 09:22, Gavin Ray <ray.gavi...@gmail.com> wrote:
>>>>>> >
>>>>>> > No dice still unfortunately =/
>>>>>> >
>>>>>> > If it's any easier, I put a self-contained single class
>>>>>> reproduction on
>>>>>> > Github:
>>>>>> >
>>>>>> https://github.com/GavinRay97/calcite-nested-json-subquery-bug/blob/main/src/test/java/com/example/AppTest.java
>>>>>> >
>>>>>> > Is it worth filing a JIRA ticket over you think?
>>>>>> >
>>>>>> >
>>>>>> >> On Wed, Feb 23, 2022 at 3:45 AM Julian Hyde <
>>>>>> jhyde.apa...@gmail.com> wrote:
>>>>>> >>
>>>>>> >> Try ‘value ((‘ in place of ‘value (‘.
>>>>>> >>
>>>>>> >> Julian
>>>>>> >>
>>>>>> >>>> On Feb 21, 2022, at 9:33 AM, Gavin Ray <ray.gavi...@gmail.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> I hadn't thought about the fact that ORM's probably have to
>>>>>> solve this
>>>>>> >>> problem as well
>>>>>> >>> That is a great suggestion, I will try to investigate some of the
>>>>>> popular
>>>>>> >>> ORM codebases and see if there are any tricks they are using.
>>>>>> >>>
>>>>>> >>> I seem to maybe be getting a tiny bit closer by using subqueries
>>>>>> like
>>>>>> >>> Julian suggested instead of operator calls
>>>>>> >>> But if I may ask what is probably a very stupid question:
>>>>>> >>>
>>>>>> >>> What might the error message
>>>>>> >>> "parse failed: Query expression encountered in illegal context
>>>>>> >>> (state=,code=0)"
>>>>>> >>>
>>>>>> >>> Mean in the below query?
>>>>>> >>>
>>>>>> >>> The reason why I am confused is because the query runs if I
>>>>>> remove the
>>>>>> >>> innermost subquery ("todos")
>>>>>> >>> But the innermost subquery is a direct copy-paste of the subquery
>>>>>> above
>>>>>> >> it,
>>>>>> >>> so I know it MUST be valid
>>>>>> >>>
>>>>>> >>> As usual, thank you so much for your help/guidance Stamatis.
>>>>>> >>>
>>>>>> >>> select
>>>>>> >>> "g0"."id" "id",
>>>>>> >>> "g0"."address" "address",
>>>>>> >>> (
>>>>>> >>>   select json_arrayagg(json_object(
>>>>>> >>>     key 'id' value "g1"."id",
>>>>>> >>>     key 'todos' value (
>>>>>> >>>       select json_arrayagg(json_object(
>>>>>> >>>         key 'id' value "g2"."id",
>>>>>> >>>         key 'description' value "g2"."description",
>>>>>> >>>       ))
>>>>>> >>>       from (
>>>>>> >>>         select * from "todos"
>>>>>> >>>         where "g1"."id" = "user_id"
>>>>>> >>>         order by "id"
>>>>>> >>>       ) "g2"
>>>>>> >>>     )
>>>>>> >>>   ))
>>>>>> >>>   from (
>>>>>> >>>     select * from "users"
>>>>>> >>>     where "g0"."id" = "house_id"
>>>>>> >>>     order by "id"
>>>>>> >>>   ) "g1"
>>>>>> >>> ) "users"
>>>>>> >>> from "houses" "g0"
>>>>>> >>> order by "g0"."id"
>>>>>> >>>
>>>>>> >>>> On Mon, Feb 21, 2022 at 8:07 AM Stamatis Zampetakis <
>>>>>> zabe...@gmail.com>
>>>>>> >>>> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi Gavin,
>>>>>> >>>>
>>>>>> >>>> A few more comments in case they help to get you a bit further
>>>>>> on your
>>>>>> >>>> work.
>>>>>> >>>>
>>>>>> >>>> The need to return the result as a single object is a common
>>>>>> problem in
>>>>>> >>>> object relational mapping (ORM) frameworks/APIS (JPA,
>>>>>> Datanucleus,
>>>>>> >>>> Hibernate, etc.). Apart from the suggestions so far maybe you
>>>>>> could look
>>>>>> >>>> into these frameworks as well for more inspiration.
>>>>>> >>>>
>>>>>> >>>> Moreover your approach of decomposing the query into individual
>>>>>> parts is
>>>>>> >>>> commonly known as the N+1 problem [1].
>>>>>> >>>>
>>>>>> >>>> Lastly, keep in mind that you can introduce custom UDF, UDAF
>>>>>> functions
>>>>>> >> if
>>>>>> >>>> you need more flexibility on reconstructing the final result.
>>>>>> >>>>
>>>>>> >>>> Best,
>>>>>> >>>> Stamatis
>>>>>> >>>>
>>>>>> >>>> [1]
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>
>>>>>> https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping
>>>>>> >>>>
>>>>>> >>>>> On Sun, Feb 13, 2022 at 3:59 AM Gavin Ray <
>>>>>> ray.gavi...@gmail.com>
>>>>>> >> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Ah wait nevermind, got excited and spoke too soon. Looking at
>>>>>> it more
>>>>>> >>>>> closely, that data isn't correct.
>>>>>> >>>>> At least it's in somewhat the right shape, ha!
>>>>>> >>>>>
>>>>>> >>>>>> On Sat, Feb 12, 2022 at 9:57 PM Gavin Ray <
>>>>>> ray.gavi...@gmail.com>
>>>>>> >> wrote:
>>>>>> >>>>>
>>>>>> >>>>>> After ~5 hours, I think I may have made some progress =)
>>>>>> >>>>>>
>>>>>> >>>>>> I have this, which currently works. The problem is that the
>>>>>> nested
>>>>>> >>>>> columns
>>>>>> >>>>>> don't have names on them.
>>>>>> >>>>>> Since I need to return a nested "Map<String, Object>", I have
>>>>>> to
>>>>>> >> figure
>>>>>> >>>>>> out how to convert this query into a form that gives column
>>>>>> names.
>>>>>> >>>>>>
>>>>>> >>>>>> But this is still great progress I think!
>>>>>> >>>>>>
>>>>>> >>>>>> SELECT
>>>>>> >>>>>>   "todos".*,
>>>>>> >>>>>>   ARRAY(
>>>>>> >>>>>>       SELECT
>>>>>> >>>>>>           "users".*,
>>>>>> >>>>>>           ARRAY(
>>>>>> >>>>>>               SELECT
>>>>>> >>>>>>                   "todos".*
>>>>>> >>>>>>               FROM
>>>>>> >>>>>>                   "todos"
>>>>>> >>>>>>           ) AS "todos"
>>>>>> >>>>>>       FROM
>>>>>> >>>>>>           "users"
>>>>>> >>>>>>   ) AS "users"
>>>>>> >>>>>> FROM
>>>>>> >>>>>>   "todos"
>>>>>> >>>>>> WHERE
>>>>>> >>>>>>   "user_id" IN (
>>>>>> >>>>>>       SELECT
>>>>>> >>>>>>           "user_id"
>>>>>> >>>>>>       FROM
>>>>>> >>>>>>           "users"
>>>>>> >>>>>>       WHERE
>>>>>> >>>>>>           "house_id" IN (
>>>>>> >>>>>>               SELECT
>>>>>> >>>>>>                   "id"
>>>>>> >>>>>>               FROM
>>>>>> >>>>>>                   "houses"
>>>>>> >>>>>>           )
>>>>>> >>>>>>   );
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>
>>>>>> +----+---------+------------------------+------------------------------------------------------------------------------+
>>>>>> >>>>>> | id | user_id |      description       |
>>>>>> >>>>>>                                           |
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>
>>>>>> +----+---------+------------------------+------------------------------------------------------------------------------+
>>>>>> >>>>>> | 1  | 1       | Take out the trash     | [{1, John, 1, [{1,
>>>>>> 1, Take
>>>>>> >>>> out
>>>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>>>> >>>>>> | 2  | 1       | Watch my favorite show | [{1, John, 1, [{1,
>>>>>> 1, Take
>>>>>> >>>> out
>>>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>>>> >>>>>> | 3  | 1       | Charge my phone        | [{1, John, 1, [{1,
>>>>>> 1, Take
>>>>>> >>>> out
>>>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>>>> >>>>>> | 4  | 2       | Cook dinner            | [{1, John, 1, [{1,
>>>>>> 1, Take
>>>>>> >>>> out
>>>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>>>> >>>>>> | 5  | 2       | Read a book            | [{1, John, 1, [{1,
>>>>>> 1, Take
>>>>>> >>>> out
>>>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>>>> >>>>>> | 6  | 2       | Organize office        | [{1, John, 1, [{1,
>>>>>> 1, Take
>>>>>> >>>> out
>>>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>>>> >>>>>> | 7  | 3       | Walk the dog           | [{1, John, 1, [{1,
>>>>>> 1, Take
>>>>>> >>>> out
>>>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>>>> >>>>>> | 8  | 3       | Feed the cat           | [{1, John, 1, [{1,
>>>>>> 1, Take
>>>>>> >>>> out
>>>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>
>>>>>> +----+---------+------------------------+------------------------------------------------------------------------------+
>>>>>> >>>>>>
>>>>>> >>>>>> On Sat, Feb 12, 2022 at 4:13 PM Gavin Ray <
>>>>>> ray.gavi...@gmail.com>
>>>>>> >>>> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>>> Nevermind, this is a standard term not something
>>>>>> Calcite-specific it
>>>>>> >>>>>>> seems!
>>>>>> >>>>>>>
>>>>>> >>>>>>> https://en.wikipedia.org/wiki/Correlated_subquery
>>>>>> >>>>>>>
>>>>>> >>>>>>> On Sat, Feb 12, 2022 at 3:46 PM Gavin Ray <
>>>>>> ray.gavi...@gmail.com>
>>>>>> >>>>> wrote:
>>>>>> >>>>>>>
>>>>>> >>>>>>>> Forgive my ignorance/lack of experience
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> I am somewhat familiar with the ARRAY() function, but not
>>>>>> sure I
>>>>>> >> know
>>>>>> >>>>>>>> the term "correlated"
>>>>>> >>>>>>>> Searching the Calcite codebase for uses of "correlated" +
>>>>>> "query", I
>>>>>> >>>>>>>> found:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>
>>>>>> https://github.com/apache/calcite/blob/1d4f1b394bfdba03c5538017e12ab2431b137ca9/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L1603-L1612
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> @Test void testCorrelatedSubQueryInJoin() {
>>>>>> >>>>>>>>   final String sql = "select *\n"
>>>>>> >>>>>>>>       + "from emp as e\n"
>>>>>> >>>>>>>>       + "join dept as d using (deptno)\n"
>>>>>> >>>>>>>>       + "where d.name = (\n"
>>>>>> >>>>>>>>       + "  select max(name)\n"
>>>>>> >>>>>>>>       + "  from dept as d2\n"
>>>>>> >>>>>>>>       + "  where d2.deptno = d.deptno)";
>>>>>> >>>>>>>>   sql(sql).withExpand(false).ok();
>>>>>> >>>>>>>> }
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> But I also see this, which says it is "uncorrelated" but
>>>>>> seems very
>>>>>> >>>>>>>> similar?
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> @Test void testInUncorrelatedSubQuery() {
>>>>>> >>>>>>>>   final String sql = "select empno from emp where deptno in"
>>>>>> >>>>>>>>       + " (select deptno from dept)";
>>>>>> >>>>>>>>   sql(sql).ok();
>>>>>> >>>>>>>> }
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> I wouldn't blame you for not answering such a basic question
>>>>>> -- but
>>>>>> >>>>> what
>>>>>> >>>>>>>> exactly does "correlation" mean here?
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Thanks, as usual Julian
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> On Sat, Feb 12, 2022 at 3:08 PM Julian Hyde <
>>>>>> jhyde.apa...@gmail.com
>>>>>> >>>
>>>>>> >>>>>>>> wrote:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>> Correlated ARRAY sub-query?
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>>> On Feb 12, 2022, at 10:40 AM, Gavin Ray <
>>>>>> ray.gavi...@gmail.com>
>>>>>> >>>>>>>>> wrote:
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> Apologies for the delay in replying
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> This makes things clear and seems obvious now that you
>>>>>> point it
>>>>>> >>>> out.
>>>>>> >>>>>>>>>> Thank you, Justin and Julian =)
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> Let me ask another question (if I may) that I am
>>>>>> struggling to
>>>>>> >>>>> phrase
>>>>>> >>>>>>>>>> easily.
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> So with GraphQL, you might have a query like:
>>>>>> >>>>>>>>>> - "Get houses"
>>>>>> >>>>>>>>>> - "For each house get the user that lives in the house
>>>>>> >>>>>>>>>> - "And for each user get their list of todos"
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> The result has to come back such that it's a single object
>>>>>> for
>>>>>> >>>> each
>>>>>> >>>>>>>>> row
>>>>>> >>>>>>>>>> ==============
>>>>>> >>>>>>>>>> {
>>>>>> >>>>>>>>>> houses: [{
>>>>>> >>>>>>>>>>  address: "123 Main Street",
>>>>>> >>>>>>>>>>  users: [{
>>>>>> >>>>>>>>>>    name: "Joe",
>>>>>> >>>>>>>>>>    todos: [{
>>>>>> >>>>>>>>>>      description: "Take out trash"
>>>>>> >>>>>>>>>>    }]
>>>>>> >>>>>>>>>>  }]
>>>>>> >>>>>>>>>> }
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> From a SQL perspective, the logical equivalent would be
>>>>>> something
>>>>>> >>>>>>>>> like:
>>>>>> >>>>>>>>>> ==============
>>>>>> >>>>>>>>>> SELECT
>>>>>> >>>>>>>>>>  house.address,
>>>>>> >>>>>>>>>>  (somehow nest users + double-nest todos under user)
>>>>>> >>>>>>>>>> FROM
>>>>>> >>>>>>>>>>  house
>>>>>> >>>>>>>>>> JOIN
>>>>>> >>>>>>>>>>  user ON user.house_id = house.id
>>>>>> >>>>>>>>>>  todos ON todos.user_id = user.id
>>>>>> >>>>>>>>>> WHERE
>>>>>> >>>>>>>>>>  house.id = 1
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> I'm not familiar enough with SQL to have figured out a way
>>>>>> to make
>>>>>> >>>>>>>>> this
>>>>>> >>>>>>>>>> kind of
>>>>>> >>>>>>>>>> query using operators that are supported across most of
>>>>>> the DB's
>>>>>> >>>>>>>>> Calcite has
>>>>>> >>>>>>>>>> adapters for.
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> Currently what I have done instead, on a tip from
>>>>>> Gopalakrishna
>>>>>> >>>>> Holla
>>>>>> >>>>>>>>> from
>>>>>> >>>>>>>>>> LinkedIn Coral team who has built GraphQL-on-Calcite, was
>>>>>> to break
>>>>>> >>>>> up
>>>>>> >>>>>>>>> the
>>>>>> >>>>>>>>>> query
>>>>>> >>>>>>>>>> into individual parts and then do the join in-memory:
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> SELECT ... FROM users;
>>>>>> >>>>>>>>>> SELECT ... FROM todos WHERE todos.user_id IN (ResultSet
>>>>>> from prev
>>>>>> >>>>>>>>> response);
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> However, the way I am doing this seems like it's probably
>>>>>> very
>>>>>> >>>>>>>>> inefficient.
>>>>>> >>>>>>>>>> Because I do a series of nested loops to add the key to
>>>>>> each
>>>>>> >>>> object
>>>>>> >>>>>>>>> in the
>>>>>> >>>>>>>>>> parent ResultSet row:
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>
>>>>>> https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> Is there some better way of doing this?
>>>>>> >>>>>>>>>> I would be eternally grateful for any advice.
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde <
>>>>>> >>>> jhyde.apa...@gmail.com
>>>>>> >>>>>>
>>>>>> >>>>>>>>> wrote:
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>>> Yes, if you want to do multiple layers of aggregation,
>>>>>> use CTEs
>>>>>> >>>>>>>>> (WITH) or
>>>>>> >>>>>>>>>>> nested sub-queries. For example, the following is I
>>>>>> believe valid
>>>>>> >>>>>>>>> standard
>>>>>> >>>>>>>>>>> SQL, and actually computes something useful:
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> WITH q1 AS
>>>>>> >>>>>>>>>>> (SELECT deptno, job, AVG(sal) AS avg_sal
>>>>>> >>>>>>>>>>>  FROM emp
>>>>>> >>>>>>>>>>>  GROUP BY deptno, job)
>>>>>> >>>>>>>>>>> WITH q2 AS
>>>>>> >>>>>>>>>>> (SELECT deptno, AVG(avg_sal) AS avg_avg_sal
>>>>>> >>>>>>>>>>>  FROM q1
>>>>>> >>>>>>>>>>>  GROUP BY deptno)
>>>>>> >>>>>>>>>>> SELECT AVG(avg_avg_sal)
>>>>>> >>>>>>>>>>> FROM q2
>>>>>> >>>>>>>>>>> GROUP BY ()
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> (You can omit the “GROUP BY ()” line, but I think it
>>>>>> makes things
>>>>>> >>>>>>>>> clearer.)
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> Julian
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart <
>>>>>> >>>>> greenl...@gmail.com>
>>>>>> >>>>>>>>>>> wrote:
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> I wish you could unsend emails :)  Answering my own
>>>>>> question,
>>>>>> >>>> no,
>>>>>> >>>>>>>>> because
>>>>>> >>>>>>>>>>>> that would return three rows with the average :D
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart <
>>>>>> >>>>>>>>> greenl...@gmail.com>
>>>>>> >>>>>>>>>>> wrote:
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>> Just out of curiosity, is the second level aggregation
>>>>>> using
>>>>>> >>>> AVG
>>>>>> >>>>>>>>> in a
>>>>>> >>>>>>>>>>>>> window context?  It the frame is the whole table and it
>>>>>> >>>>> aggregates
>>>>>> >>>>>>>>> over
>>>>>> >>>>>>>>>>> it?
>>>>>> >>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart <
>>>>>> >>>>>>>>> greenl...@gmail.com>
>>>>>> >>>>>>>>>>>>> wrote:
>>>>>> >>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>> That is really neat about Oracle.
>>>>>> >>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>> The alternative in general is to use a subquery:
>>>>>> >>>>>>>>>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
>>>>>> >>>>>>>>>>>>>> becomes
>>>>>> >>>>>>>>>>>>>> select avg(the_avg)
>>>>>> >>>>>>>>>>>>>> from (select avg(sal) from emp group b deptno)
>>>>>> an_alias;
>>>>>> >>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>> or
>>>>>> >>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>> with the_cte as (select avg(sal) x from emp group by
>>>>>> deptno)
>>>>>> >>>>>>>>>>>>>> select avg(x) from the_cte;
>>>>>> >>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde <
>>>>>> >>>>>>>>> jhyde.apa...@gmail.com>
>>>>>> >>>>>>>>>>>>>> wrote:
>>>>>> >>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>> Some databases, e.g. Oracle, allow TWO levels of
>>>>>> nesting:
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>> SELECT avg(sal) FROM emp GROUP BY deptno;
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>> AVG(SAL)
>>>>>> >>>>>>>>>>>>>>> ========
>>>>>> >>>>>>>>>>>>>>> 1,566.67
>>>>>> >>>>>>>>>>>>>>> 2,175.00
>>>>>> >>>>>>>>>>>>>>> 2,916.65
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>> AVG(SUM(SAL))
>>>>>> >>>>>>>>>>>>>>> =============
>>>>>> >>>>>>>>>>>>>>>     9,675
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>> The first level aggregates by department (returning 3
>>>>>> >>>> records),
>>>>>> >>>>>>>>> and
>>>>>> >>>>>>>>>>> the
>>>>>> >>>>>>>>>>>>>>> second level computes the grand total (returning 1
>>>>>> record).
>>>>>> >>>> But
>>>>>> >>>>>>>>> that
>>>>>> >>>>>>>>>>> is an
>>>>>> >>>>>>>>>>>>>>> exceptional case.
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>> Generally, any expression in the SELECT or HAVING
>>>>>> clause of
>>>>>> >>>> an
>>>>>> >>>>>>>>>>> aggregate
>>>>>> >>>>>>>>>>>>>>> query is either ‘before’ or ‘after’ aggregation.
>>>>>> Consider
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b
>>>>>> >>>>>>>>>>>>>>> FROM t
>>>>>> >>>>>>>>>>>>>>> GROUP BY t.x
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>> The expressions “t.y” and “t.y + 3” occur before
>>>>>> aggregation;
>>>>>> >>>>>>>>> “t.x”,
>>>>>> >>>>>>>>>>>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)”
>>>>>> occur after
>>>>>> >>>>>>>>>>> aggregation.
>>>>>> >>>>>>>>>>>>>>> SQL semantics rely heavily on this stratification.
>>>>>> Allowing
>>>>>> >>>> an
>>>>>> >>>>>>>>> extra
>>>>>> >>>>>>>>>>> level
>>>>>> >>>>>>>>>>>>>>> of aggregation would mess it all up.
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>> Julian
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart <
>>>>>> >>>>>>>>> greenl...@gmail.com>
>>>>>> >>>>>>>>>>>>>>> wrote:
>>>>>> >>>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>> This is a SQL limitation.
>>>>>> >>>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>> mysql> select sum(1);
>>>>>> >>>>>>>>>>>>>>>> +--------+
>>>>>> >>>>>>>>>>>>>>>> | sum(1) |
>>>>>> >>>>>>>>>>>>>>>> +--------+
>>>>>> >>>>>>>>>>>>>>>> |      1 |
>>>>>> >>>>>>>>>>>>>>>> +--------+
>>>>>> >>>>>>>>>>>>>>>> 1 row in set (0.00 sec)
>>>>>> >>>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>> mysql> select sum(sum(1));
>>>>>> >>>>>>>>>>>>>>>> ERROR 1111 (HY000): Invalid use of group function
>>>>>> >>>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray <
>>>>>> >>>>>>>>> ray.gavi...@gmail.com>
>>>>>> >>>>>>>>>>>>>>> wrote:
>>>>>> >>>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>>> Went to test this query out and found that it can't
>>>>>> be
>>>>>> >>>>>>>>> performed:
>>>>>> >>>>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>>> SELECT
>>>>>> >>>>>>>>>>>>>>>>> JSON_OBJECT(
>>>>>> >>>>>>>>>>>>>>>>>    KEY 'users'
>>>>>> >>>>>>>>>>>>>>>>>    VALUE JSON_ARRAYAGG(
>>>>>> >>>>>>>>>>>>>>>>>        JSON_OBJECT(
>>>>>> >>>>>>>>>>>>>>>>>            KEY 'name' VALUE "users"."name",
>>>>>> >>>>>>>>>>>>>>>>>            KEY 'todos' VALUE JSON_ARRAYAGG(
>>>>>> >>>>>>>>>>>>>>>>>                JSON_OBJECT(
>>>>>> >>>>>>>>>>>>>>>>>                    KEY 'description' VALUE
>>>>>> >>>>>>>>> "todos"."description"
>>>>>> >>>>>>>>>>>>>>>>>                )
>>>>>> >>>>>>>>>>>>>>>>>            )
>>>>>> >>>>>>>>>>>>>>>>>        )
>>>>>> >>>>>>>>>>>>>>>>>    )
>>>>>> >>>>>>>>>>>>>>>>> )
>>>>>> >>>>>>>>>>>>>>>>> FROM
>>>>>> >>>>>>>>>>>>>>>>> "users"
>>>>>> >>>>>>>>>>>>>>>>> LEFT OUTER JOIN
>>>>>> >>>>>>>>>>>>>>>>> "todos" ON "users"."id" = "todos"."user_id";
>>>>>> >>>>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>>> Checking the source, seems this is a blanket
>>>>>> policy, not a
>>>>>> >>>>>>>>>>>>>>>>> datasource-specific thing.
>>>>>> >>>>>>>>>>>>>>>>> From a functional perspective, it doesn't feel like
>>>>>> it's
>>>>>> >>>> much
>>>>>> >>>>>>>>>>>>>>> different
>>>>>> >>>>>>>>>>>>>>>>> from JOINs
>>>>>> >>>>>>>>>>>>>>>>> But I don't understand relational theory or DB
>>>>>> >>>> functionality
>>>>>> >>>>>>>>> in the
>>>>>> >>>>>>>>>>>>>>> least,
>>>>>> >>>>>>>>>>>>>>>>> so I'm not fit to judge.
>>>>>> >>>>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>>> Just curious why Calcite doesn't allow this
>>>>>> >>>>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>
>>>>>>
>>>>>

Re: Why are nested aggregations illegal? Best alternatives?

Reply via email to