It’s a parser error. That usually means that the user has made a mistake.
Try your SQL on another DB with JSON support before you declare this a Calcite bug. Julian > On Feb 23, 2022, at 09:22, Gavin Ray <ray.gavi...@gmail.com> wrote: > > No dice still unfortunately =/ > > If it's any easier, I put a self-contained single class reproduction on > Github: > https://github.com/GavinRay97/calcite-nested-json-subquery-bug/blob/main/src/test/java/com/example/AppTest.java > > Is it worth filing a JIRA ticket over you think? > > >> On Wed, Feb 23, 2022 at 3:45 AM Julian Hyde <jhyde.apa...@gmail.com> wrote: >> >> Try ‘value ((‘ in place of ‘value (‘. >> >> Julian >> >>>> On Feb 21, 2022, at 9:33 AM, Gavin Ray <ray.gavi...@gmail.com> wrote: >>> >>> I hadn't thought about the fact that ORM's probably have to solve this >>> problem as well >>> That is a great suggestion, I will try to investigate some of the popular >>> ORM codebases and see if there are any tricks they are using. >>> >>> I seem to maybe be getting a tiny bit closer by using subqueries like >>> Julian suggested instead of operator calls >>> But if I may ask what is probably a very stupid question: >>> >>> What might the error message >>> "parse failed: Query expression encountered in illegal context >>> (state=,code=0)" >>> >>> Mean in the below query? >>> >>> The reason why I am confused is because the query runs if I remove the >>> innermost subquery ("todos") >>> But the innermost subquery is a direct copy-paste of the subquery above >> it, >>> so I know it MUST be valid >>> >>> As usual, thank you so much for your help/guidance Stamatis. >>> >>> select >>> "g0"."id" "id", >>> "g0"."address" "address", >>> ( >>> select json_arrayagg(json_object( >>> key 'id' value "g1"."id", >>> key 'todos' value ( >>> select json_arrayagg(json_object( >>> key 'id' value "g2"."id", >>> key 'description' value "g2"."description", >>> )) >>> from ( >>> select * from "todos" >>> where "g1"."id" = "user_id" >>> order by "id" >>> ) "g2" >>> ) >>> )) >>> from ( >>> select * from "users" >>> where "g0"."id" = "house_id" >>> order by "id" >>> ) "g1" >>> ) "users" >>> from "houses" "g0" >>> order by "g0"."id" >>> >>>> On Mon, Feb 21, 2022 at 8:07 AM Stamatis Zampetakis <zabe...@gmail.com> >>>> wrote: >>>> >>>> Hi Gavin, >>>> >>>> A few more comments in case they help to get you a bit further on your >>>> work. >>>> >>>> The need to return the result as a single object is a common problem in >>>> object relational mapping (ORM) frameworks/APIS (JPA, Datanucleus, >>>> Hibernate, etc.). Apart from the suggestions so far maybe you could look >>>> into these frameworks as well for more inspiration. >>>> >>>> Moreover your approach of decomposing the query into individual parts is >>>> commonly known as the N+1 problem [1]. >>>> >>>> Lastly, keep in mind that you can introduce custom UDF, UDAF functions >> if >>>> you need more flexibility on reconstructing the final result. >>>> >>>> Best, >>>> Stamatis >>>> >>>> [1] >>>> >>>> >> https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping >>>> >>>>> On Sun, Feb 13, 2022 at 3:59 AM Gavin Ray <ray.gavi...@gmail.com> >> wrote: >>>>> >>>>> Ah wait nevermind, got excited and spoke too soon. Looking at it more >>>>> closely, that data isn't correct. >>>>> At least it's in somewhat the right shape, ha! >>>>> >>>>>> On Sat, Feb 12, 2022 at 9:57 PM Gavin Ray <ray.gavi...@gmail.com> >> wrote: >>>>> >>>>>> After ~5 hours, I think I may have made some progress =) >>>>>> >>>>>> I have this, which currently works. The problem is that the nested >>>>> columns >>>>>> don't have names on them. >>>>>> Since I need to return a nested "Map<String, Object>", I have to >> figure >>>>>> out how to convert this query into a form that gives column names. >>>>>> >>>>>> But this is still great progress I think! >>>>>> >>>>>> SELECT >>>>>> "todos".*, >>>>>> ARRAY( >>>>>> SELECT >>>>>> "users".*, >>>>>> ARRAY( >>>>>> SELECT >>>>>> "todos".* >>>>>> FROM >>>>>> "todos" >>>>>> ) AS "todos" >>>>>> FROM >>>>>> "users" >>>>>> ) AS "users" >>>>>> FROM >>>>>> "todos" >>>>>> WHERE >>>>>> "user_id" IN ( >>>>>> SELECT >>>>>> "user_id" >>>>>> FROM >>>>>> "users" >>>>>> WHERE >>>>>> "house_id" IN ( >>>>>> SELECT >>>>>> "id" >>>>>> FROM >>>>>> "houses" >>>>>> ) >>>>>> ); >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >> +----+---------+------------------------+------------------------------------------------------------------------------+ >>>>>> | id | user_id | description | >>>>>> | >>>>>> >>>>>> >>>>> >>>> >> +----+---------+------------------------+------------------------------------------------------------------------------+ >>>>>> | 1 | 1 | Take out the trash | [{1, John, 1, [{1, 1, Take >>>> out >>>>>> the trash}, {2, 1, Watch my favorite show}, { | >>>>>> | 2 | 1 | Watch my favorite show | [{1, John, 1, [{1, 1, Take >>>> out >>>>>> the trash}, {2, 1, Watch my favorite show}, { | >>>>>> | 3 | 1 | Charge my phone | [{1, John, 1, [{1, 1, Take >>>> out >>>>>> the trash}, {2, 1, Watch my favorite show}, { | >>>>>> | 4 | 2 | Cook dinner | [{1, John, 1, [{1, 1, Take >>>> out >>>>>> the trash}, {2, 1, Watch my favorite show}, { | >>>>>> | 5 | 2 | Read a book | [{1, John, 1, [{1, 1, Take >>>> out >>>>>> the trash}, {2, 1, Watch my favorite show}, { | >>>>>> | 6 | 2 | Organize office | [{1, John, 1, [{1, 1, Take >>>> out >>>>>> the trash}, {2, 1, Watch my favorite show}, { | >>>>>> | 7 | 3 | Walk the dog | [{1, John, 1, [{1, 1, Take >>>> out >>>>>> the trash}, {2, 1, Watch my favorite show}, { | >>>>>> | 8 | 3 | Feed the cat | [{1, John, 1, [{1, 1, Take >>>> out >>>>>> the trash}, {2, 1, Watch my favorite show}, { | >>>>>> >>>>>> >>>>> >>>> >> +----+---------+------------------------+------------------------------------------------------------------------------+ >>>>>> >>>>>> On Sat, Feb 12, 2022 at 4:13 PM Gavin Ray <ray.gavi...@gmail.com> >>>> wrote: >>>>>> >>>>>>> Nevermind, this is a standard term not something Calcite-specific it >>>>>>> seems! >>>>>>> >>>>>>> https://en.wikipedia.org/wiki/Correlated_subquery >>>>>>> >>>>>>> On Sat, Feb 12, 2022 at 3:46 PM Gavin Ray <ray.gavi...@gmail.com> >>>>> wrote: >>>>>>> >>>>>>>> Forgive my ignorance/lack of experience >>>>>>>> >>>>>>>> I am somewhat familiar with the ARRAY() function, but not sure I >> know >>>>>>>> the term "correlated" >>>>>>>> Searching the Calcite codebase for uses of "correlated" + "query", I >>>>>>>> found: >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>> >> https://github.com/apache/calcite/blob/1d4f1b394bfdba03c5538017e12ab2431b137ca9/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L1603-L1612 >>>>>>>> >>>>>>>> @Test void testCorrelatedSubQueryInJoin() { >>>>>>>> final String sql = "select *\n" >>>>>>>> + "from emp as e\n" >>>>>>>> + "join dept as d using (deptno)\n" >>>>>>>> + "where d.name = (\n" >>>>>>>> + " select max(name)\n" >>>>>>>> + " from dept as d2\n" >>>>>>>> + " where d2.deptno = d.deptno)"; >>>>>>>> sql(sql).withExpand(false).ok(); >>>>>>>> } >>>>>>>> >>>>>>>> But I also see this, which says it is "uncorrelated" but seems very >>>>>>>> similar? >>>>>>>> >>>>>>>> @Test void testInUncorrelatedSubQuery() { >>>>>>>> final String sql = "select empno from emp where deptno in" >>>>>>>> + " (select deptno from dept)"; >>>>>>>> sql(sql).ok(); >>>>>>>> } >>>>>>>> >>>>>>>> I wouldn't blame you for not answering such a basic question -- but >>>>> what >>>>>>>> exactly does "correlation" mean here? >>>>>>>> >>>>>>>> Thanks, as usual Julian >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Feb 12, 2022 at 3:08 PM Julian Hyde <jhyde.apa...@gmail.com >>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Correlated ARRAY sub-query? >>>>>>>>> >>>>>>>>>> On Feb 12, 2022, at 10:40 AM, Gavin Ray <ray.gavi...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Apologies for the delay in replying >>>>>>>>>> >>>>>>>>>> This makes things clear and seems obvious now that you point it >>>> out. >>>>>>>>>> Thank you, Justin and Julian =) >>>>>>>>>> >>>>>>>>>> Let me ask another question (if I may) that I am struggling to >>>>> phrase >>>>>>>>>> easily. >>>>>>>>>> >>>>>>>>>> So with GraphQL, you might have a query like: >>>>>>>>>> - "Get houses" >>>>>>>>>> - "For each house get the user that lives in the house >>>>>>>>>> - "And for each user get their list of todos" >>>>>>>>>> >>>>>>>>>> The result has to come back such that it's a single object for >>>> each >>>>>>>>> row >>>>>>>>>> ============== >>>>>>>>>> { >>>>>>>>>> houses: [{ >>>>>>>>>> address: "123 Main Street", >>>>>>>>>> users: [{ >>>>>>>>>> name: "Joe", >>>>>>>>>> todos: [{ >>>>>>>>>> description: "Take out trash" >>>>>>>>>> }] >>>>>>>>>> }] >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> From a SQL perspective, the logical equivalent would be something >>>>>>>>> like: >>>>>>>>>> ============== >>>>>>>>>> SELECT >>>>>>>>>> house.address, >>>>>>>>>> (somehow nest users + double-nest todos under user) >>>>>>>>>> FROM >>>>>>>>>> house >>>>>>>>>> JOIN >>>>>>>>>> user ON user.house_id = house.id >>>>>>>>>> todos ON todos.user_id = user.id >>>>>>>>>> WHERE >>>>>>>>>> house.id = 1 >>>>>>>>>> >>>>>>>>>> I'm not familiar enough with SQL to have figured out a way to make >>>>>>>>> this >>>>>>>>>> kind of >>>>>>>>>> query using operators that are supported across most of the DB's >>>>>>>>> Calcite has >>>>>>>>>> adapters for. >>>>>>>>>> >>>>>>>>>> Currently what I have done instead, on a tip from Gopalakrishna >>>>> Holla >>>>>>>>> from >>>>>>>>>> LinkedIn Coral team who has built GraphQL-on-Calcite, was to break >>>>> up >>>>>>>>> the >>>>>>>>>> query >>>>>>>>>> into individual parts and then do the join in-memory: >>>>>>>>>> >>>>>>>>>> SELECT ... FROM users; >>>>>>>>>> SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from prev >>>>>>>>> response); >>>>>>>>>> >>>>>>>>>> However, the way I am doing this seems like it's probably very >>>>>>>>> inefficient. >>>>>>>>>> Because I do a series of nested loops to add the key to each >>>> object >>>>>>>>> in the >>>>>>>>>> parent ResultSet row: >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>> >>>> >> https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153 >>>>>>>>>> >>>>>>>>>> Is there some better way of doing this? >>>>>>>>>> I would be eternally grateful for any advice. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde < >>>> jhyde.apa...@gmail.com >>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Yes, if you want to do multiple layers of aggregation, use CTEs >>>>>>>>> (WITH) or >>>>>>>>>>> nested sub-queries. For example, the following is I believe valid >>>>>>>>> standard >>>>>>>>>>> SQL, and actually computes something useful: >>>>>>>>>>> >>>>>>>>>>> WITH q1 AS >>>>>>>>>>> (SELECT deptno, job, AVG(sal) AS avg_sal >>>>>>>>>>> FROM emp >>>>>>>>>>> GROUP BY deptno, job) >>>>>>>>>>> WITH q2 AS >>>>>>>>>>> (SELECT deptno, AVG(avg_sal) AS avg_avg_sal >>>>>>>>>>> FROM q1 >>>>>>>>>>> GROUP BY deptno) >>>>>>>>>>> SELECT AVG(avg_avg_sal) >>>>>>>>>>> FROM q2 >>>>>>>>>>> GROUP BY () >>>>>>>>>>> >>>>>>>>>>> (You can omit the “GROUP BY ()” line, but I think it makes things >>>>>>>>> clearer.) >>>>>>>>>>> >>>>>>>>>>> Julian >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart < >>>>> greenl...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I wish you could unsend emails :) Answering my own question, >>>> no, >>>>>>>>> because >>>>>>>>>>>> that would return three rows with the average :D >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart < >>>>>>>>> greenl...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Just out of curiosity, is the second level aggregation using >>>> AVG >>>>>>>>> in a >>>>>>>>>>>>> window context? It the frame is the whole table and it >>>>> aggregates >>>>>>>>> over >>>>>>>>>>> it? >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart < >>>>>>>>> greenl...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> That is really neat about Oracle. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The alternative in general is to use a subquery: >>>>>>>>>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno; >>>>>>>>>>>>>> becomes >>>>>>>>>>>>>> select avg(the_avg) >>>>>>>>>>>>>> from (select avg(sal) from emp group b deptno) an_alias; >>>>>>>>>>>>>> >>>>>>>>>>>>>> or >>>>>>>>>>>>>> >>>>>>>>>>>>>> with the_cte as (select avg(sal) x from emp group by deptno) >>>>>>>>>>>>>> select avg(x) from the_cte; >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde < >>>>>>>>> jhyde.apa...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Some databases, e.g. Oracle, allow TWO levels of nesting: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> SELECT avg(sal) FROM emp GROUP BY deptno; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> AVG(SAL) >>>>>>>>>>>>>>> ======== >>>>>>>>>>>>>>> 1,566.67 >>>>>>>>>>>>>>> 2,175.00 >>>>>>>>>>>>>>> 2,916.65 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> AVG(SUM(SAL)) >>>>>>>>>>>>>>> ============= >>>>>>>>>>>>>>> 9,675 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The first level aggregates by department (returning 3 >>>> records), >>>>>>>>> and >>>>>>>>>>> the >>>>>>>>>>>>>>> second level computes the grand total (returning 1 record). >>>> But >>>>>>>>> that >>>>>>>>>>> is an >>>>>>>>>>>>>>> exceptional case. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Generally, any expression in the SELECT or HAVING clause of >>>> an >>>>>>>>>>> aggregate >>>>>>>>>>>>>>> query is either ‘before’ or ‘after’ aggregation. Consider >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b >>>>>>>>>>>>>>> FROM t >>>>>>>>>>>>>>> GROUP BY t.x >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The expressions “t.y” and “t.y + 3” occur before aggregation; >>>>>>>>> “t.x”, >>>>>>>>>>>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur after >>>>>>>>>>> aggregation. >>>>>>>>>>>>>>> SQL semantics rely heavily on this stratification. Allowing >>>> an >>>>>>>>> extra >>>>>>>>>>> level >>>>>>>>>>>>>>> of aggregation would mess it all up. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Julian >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart < >>>>>>>>> greenl...@gmail.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is a SQL limitation. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> mysql> select sum(1); >>>>>>>>>>>>>>>> +--------+ >>>>>>>>>>>>>>>> | sum(1) | >>>>>>>>>>>>>>>> +--------+ >>>>>>>>>>>>>>>> | 1 | >>>>>>>>>>>>>>>> +--------+ >>>>>>>>>>>>>>>> 1 row in set (0.00 sec) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> mysql> select sum(sum(1)); >>>>>>>>>>>>>>>> ERROR 1111 (HY000): Invalid use of group function >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray < >>>>>>>>> ray.gavi...@gmail.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Went to test this query out and found that it can't be >>>>>>>>> performed: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> SELECT >>>>>>>>>>>>>>>>> JSON_OBJECT( >>>>>>>>>>>>>>>>> KEY 'users' >>>>>>>>>>>>>>>>> VALUE JSON_ARRAYAGG( >>>>>>>>>>>>>>>>> JSON_OBJECT( >>>>>>>>>>>>>>>>> KEY 'name' VALUE "users"."name", >>>>>>>>>>>>>>>>> KEY 'todos' VALUE JSON_ARRAYAGG( >>>>>>>>>>>>>>>>> JSON_OBJECT( >>>>>>>>>>>>>>>>> KEY 'description' VALUE >>>>>>>>> "todos"."description" >>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>> FROM >>>>>>>>>>>>>>>>> "users" >>>>>>>>>>>>>>>>> LEFT OUTER JOIN >>>>>>>>>>>>>>>>> "todos" ON "users"."id" = "todos"."user_id"; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Checking the source, seems this is a blanket policy, not a >>>>>>>>>>>>>>>>> datasource-specific thing. >>>>>>>>>>>>>>>>> From a functional perspective, it doesn't feel like it's >>>> much >>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>> from JOINs >>>>>>>>>>>>>>>>> But I don't understand relational theory or DB >>>> functionality >>>>>>>>> in the >>>>>>>>>>>>>>> least, >>>>>>>>>>>>>>>>> so I'm not fit to judge. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Just curious why Calcite doesn't allow this >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>> >>>> >>