Forgive my ignorance/lack of experience I am somewhat familiar with the ARRAY() function, but not sure I know the term "correlated" Searching the Calcite codebase for uses of "correlated" + "query", I found:
https://github.com/apache/calcite/blob/1d4f1b394bfdba03c5538017e12ab2431b137ca9/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L1603-L1612 @Test void testCorrelatedSubQueryInJoin() { final String sql = "select *\n" + "from emp as e\n" + "join dept as d using (deptno)\n" + "where d.name = (\n" + " select max(name)\n" + " from dept as d2\n" + " where d2.deptno = d.deptno)"; sql(sql).withExpand(false).ok(); } But I also see this, which says it is "uncorrelated" but seems very similar? @Test void testInUncorrelatedSubQuery() { final String sql = "select empno from emp where deptno in" + " (select deptno from dept)"; sql(sql).ok(); } I wouldn't blame you for not answering such a basic question -- but what exactly does "correlation" mean here? Thanks, as usual Julian On Sat, Feb 12, 2022 at 3:08 PM Julian Hyde <jhyde.apa...@gmail.com> wrote: > Correlated ARRAY sub-query? > > > On Feb 12, 2022, at 10:40 AM, Gavin Ray <ray.gavi...@gmail.com> wrote: > > > > Apologies for the delay in replying > > > > This makes things clear and seems obvious now that you point it out. > > Thank you, Justin and Julian =) > > > > Let me ask another question (if I may) that I am struggling to phrase > > easily. > > > > So with GraphQL, you might have a query like: > > - "Get houses" > > - "For each house get the user that lives in the house > > - "And for each user get their list of todos" > > > > The result has to come back such that it's a single object for each row > > ============== > > { > > houses: [{ > > address: "123 Main Street", > > users: [{ > > name: "Joe", > > todos: [{ > > description: "Take out trash" > > }] > > }] > > } > > > > From a SQL perspective, the logical equivalent would be something like: > > ============== > > SELECT > > house.address, > > (somehow nest users + double-nest todos under user) > > FROM > > house > > JOIN > > user ON user.house_id = house.id > > todos ON todos.user_id = user.id > > WHERE > > house.id = 1 > > > > I'm not familiar enough with SQL to have figured out a way to make this > > kind of > > query using operators that are supported across most of the DB's Calcite > has > > adapters for. > > > > Currently what I have done instead, on a tip from Gopalakrishna Holla > from > > LinkedIn Coral team who has built GraphQL-on-Calcite, was to break up the > > query > > into individual parts and then do the join in-memory: > > > > SELECT ... FROM users; > > SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from prev > response); > > > > However, the way I am doing this seems like it's probably very > inefficient. > > Because I do a series of nested loops to add the key to each object in > the > > parent ResultSet row: > > > > > https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153 > > > > Is there some better way of doing this? > > I would be eternally grateful for any advice. > > > > > > On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde <jhyde.apa...@gmail.com> > wrote: > > > >> Yes, if you want to do multiple layers of aggregation, use CTEs (WITH) > or > >> nested sub-queries. For example, the following is I believe valid > standard > >> SQL, and actually computes something useful: > >> > >> WITH q1 AS > >> (SELECT deptno, job, AVG(sal) AS avg_sal > >> FROM emp > >> GROUP BY deptno, job) > >> WITH q2 AS > >> (SELECT deptno, AVG(avg_sal) AS avg_avg_sal > >> FROM q1 > >> GROUP BY deptno) > >> SELECT AVG(avg_avg_sal) > >> FROM q2 > >> GROUP BY () > >> > >> (You can omit the “GROUP BY ()” line, but I think it makes things > clearer.) > >> > >> Julian > >> > >> > >> > >>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart <greenl...@gmail.com> > >> wrote: > >>> > >>> I wish you could unsend emails :) Answering my own question, no, > because > >>> that would return three rows with the average :D > >>> > >>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart <greenl...@gmail.com> > >> wrote: > >>> > >>>> Just out of curiosity, is the second level aggregation using AVG in a > >>>> window context? It the frame is the whole table and it aggregates > over > >> it? > >>>> > >>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart <greenl...@gmail.com> > >>>> wrote: > >>>> > >>>>> That is really neat about Oracle. > >>>>> > >>>>> The alternative in general is to use a subquery: > >>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno; > >>>>> becomes > >>>>> select avg(the_avg) > >>>>> from (select avg(sal) from emp group b deptno) an_alias; > >>>>> > >>>>> or > >>>>> > >>>>> with the_cte as (select avg(sal) x from emp group by deptno) > >>>>> select avg(x) from the_cte; > >>>>> > >>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde <jhyde.apa...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Some databases, e.g. Oracle, allow TWO levels of nesting: > >>>>>> > >>>>>> SELECT avg(sal) FROM emp GROUP BY deptno; > >>>>>> > >>>>>> AVG(SAL) > >>>>>> ======== > >>>>>> 1,566.67 > >>>>>> 2,175.00 > >>>>>> 2,916.65 > >>>>>> > >>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno; > >>>>>> > >>>>>> AVG(SUM(SAL)) > >>>>>> ============= > >>>>>> 9,675 > >>>>>> > >>>>>> The first level aggregates by department (returning 3 records), and > >> the > >>>>>> second level computes the grand total (returning 1 record). But that > >> is an > >>>>>> exceptional case. > >>>>>> > >>>>>> Generally, any expression in the SELECT or HAVING clause of an > >> aggregate > >>>>>> query is either ‘before’ or ‘after’ aggregation. Consider > >>>>>> > >>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b > >>>>>> FROM t > >>>>>> GROUP BY t.x > >>>>>> > >>>>>> The expressions “t.y” and “t.y + 3” occur before aggregation; “t.x”, > >>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur after > >> aggregation. > >>>>>> SQL semantics rely heavily on this stratification. Allowing an extra > >> level > >>>>>> of aggregation would mess it all up. > >>>>>> > >>>>>> Julian > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart <greenl...@gmail.com> > >>>>>> wrote: > >>>>>>> > >>>>>>> This is a SQL limitation. > >>>>>>> > >>>>>>> mysql> select sum(1); > >>>>>>> +--------+ > >>>>>>> | sum(1) | > >>>>>>> +--------+ > >>>>>>> | 1 | > >>>>>>> +--------+ > >>>>>>> 1 row in set (0.00 sec) > >>>>>>> > >>>>>>> mysql> select sum(sum(1)); > >>>>>>> ERROR 1111 (HY000): Invalid use of group function > >>>>>>> > >>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray <ray.gavi...@gmail.com> > >>>>>> wrote: > >>>>>>> > >>>>>>>> Went to test this query out and found that it can't be performed: > >>>>>>>> > >>>>>>>> SELECT > >>>>>>>> JSON_OBJECT( > >>>>>>>> KEY 'users' > >>>>>>>> VALUE JSON_ARRAYAGG( > >>>>>>>> JSON_OBJECT( > >>>>>>>> KEY 'name' VALUE "users"."name", > >>>>>>>> KEY 'todos' VALUE JSON_ARRAYAGG( > >>>>>>>> JSON_OBJECT( > >>>>>>>> KEY 'description' VALUE "todos"."description" > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> FROM > >>>>>>>> "users" > >>>>>>>> LEFT OUTER JOIN > >>>>>>>> "todos" ON "users"."id" = "todos"."user_id"; > >>>>>>>> > >>>>>>>> Checking the source, seems this is a blanket policy, not a > >>>>>>>> datasource-specific thing. > >>>>>>>> From a functional perspective, it doesn't feel like it's much > >>>>>> different > >>>>>>>> from JOINs > >>>>>>>> But I don't understand relational theory or DB functionality in > the > >>>>>> least, > >>>>>>>> so I'm not fit to judge. > >>>>>>>> > >>>>>>>> Just curious why Calcite doesn't allow this > >>>>>>>> > >>>>>> > >>>>>> > >> > >> > >