maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307545016
########## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ########## @@ -328,16 +328,891 @@ struct<scalarsubquery():int> -- !query 25 -DROP VIEW IF EXISTS t +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 25 schema struct<> -- !query 25 output - +org.apache.spark.sql.AnalysisException +Table or view not found: r; line 4 pos 24 -- !query 26 -DROP VIEW IF EXISTS t2 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 26 schema -struct<> +struct<level:int> -- !query 26 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 27 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r +-- !query 27 schema +struct<> +-- !query 27 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 28 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10 +-- !query 28 schema +struct<level:int> +-- !query 28 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 29 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10 +-- !query 29 schema +struct<level:int,level:int> +-- !query 29 output +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 + + +-- !query 30 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10 +-- !query 30 schema +struct<> +-- !query 30 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 31 +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r +-- !query 31 schema +struct<c:string> +-- !query 31 output +a +a b +a b b +a b b b +a b b b b +a b b b b b + + +-- !query 32 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r +-- !query 32 schema +struct<level:int> +-- !query 32 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 33 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 33 schema +struct<level:int,data:string> +-- !query 33 output +0 A +0 B +1 AC +1 BC +2 ACC +2 BCC +3 ACCC +3 BCCC + + +-- !query 34 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data || 'B' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 34 schema +struct<level:int,data:string> +-- !query 34 output +0 A +1 AB +1 AC +2 ABB +2 ABC +2 ACB +2 ACC +3 ABBC +3 ABCC +3 ACBC +3 ACCC + + +-- !query 35 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'D' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 35 schema +struct<level:int,data:string> +-- !query 35 output +0 A +0 B +1 AC +1 AD +1 BC +1 BD +2 ACC +2 ACD +2 ADC +2 ADD +2 BCC +2 BCD +2 BDC +2 BDD +3 ACCD +3 ACDD +3 ADCD +3 ADDD +3 BCCD +3 BCDD +3 BDCD +3 BDDD + + +-- !query 36 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 36 schema +struct<> +-- !query 36 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 37 +WITH RECURSIVE r(level) AS ( + VALUES (0), (0) + UNION + SELECT (level + 1) % 10 FROM r +) +SELECT * FROM r +-- !query 37 schema +struct<level:int> +-- !query 37 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 38 +WITH RECURSIVE r(level) AS ( + VALUES (0) + INTERSECT + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r +-- !query 38 schema +struct<> +-- !query 38 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 39 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE (SELECT SUM(level) FROM r) < 10 +) +SELECT * FROM r +-- !query 39 schema +struct<> +-- !query 39 output +org.apache.spark.sql.AnalysisException +Recursive reference r cannot be used here. This can be caused by using it on inner side of an outer join, using it with aggregate or distinct, using it in a subquery or using it multiple times in a recursive term (except for using it on different sides of an UNION ALL).; + + +-- !query 40 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT r1.level + 1, r1.data + FROM r AS r1 + JOIN r AS r2 ON r2.data = r1.data + WHERE r1.level < 10 +) +SELECT * FROM r +-- !query 40 schema +struct<> +-- !query 40 output +org.apache.spark.sql.AnalysisException +Recursive reference r cannot be used multiple times in a recursive term; + + +-- !query 41 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, r.data + FROM ( + SELECT 'B' AS data + ) AS o + LEFT JOIN r ON r.data = o.data +) +SELECT * FROM r +-- !query 41 schema +struct<> +-- !query 41 output +org.apache.spark.sql.AnalysisException +Recursive reference r cannot be used here. This can be caused by using it on inner side of an outer join, using it with aggregate or distinct, using it in a subquery or using it multiple times in a recursive term (except for using it on different sides of an UNION ALL).; + + +-- !query 42 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, r.data + FROM r + RIGHT JOIN ( + SELECT 'B' AS data + ) AS o ON o.data = r.data +) +SELECT * FROM r +-- !query 42 schema +struct<> +-- !query 42 output +org.apache.spark.sql.AnalysisException +Recursive reference r cannot be used here. This can be caused by using it on inner side of an outer join, using it with aggregate or distinct, using it in a subquery or using it multiple times in a recursive term (except for using it on different sides of an UNION ALL).; + + +-- !query 43 +WITH RECURSIVE r(level, data) AS ( + SELECT MAX(level) AS level, SUM(data) AS data FROM VALUES (0, 1), (0, 2) + UNION ALL + SELECT level + 1, data FROM r WHERE level < 10 +) +SELECT * FROM r ORDER BY level +-- !query 43 schema +struct<> +-- !query 43 output +org.apache.spark.sql.AnalysisException +cannot resolve '`level`' given input columns: [col1, col2]; line 2 pos 13 + + +-- !query 44 +WITH RECURSIVE r(group, data) AS ( + VALUES (0, 1L) + UNION ALL + SELECT 1, SUM(data) FROM r WHERE data < 10 GROUP BY group +) +SELECT * FROM r +-- !query 44 schema +struct<> +-- !query 44 output +org.apache.spark.sql.AnalysisException +Recursive reference r cannot be used here. This can be caused by using it on inner side of an outer join, using it with aggregate or distinct, using it in a subquery or using it multiple times in a recursive term (except for using it on different sides of an UNION ALL).; + + +-- !query 45 +WITH RECURSIVE r(level) AS ( + VALUES (1L) + UNION ALL + SELECT SUM(level) FROM r WHERE level < 10 +) +SELECT * FROM r +-- !query 45 schema +struct<> +-- !query 45 output +org.apache.spark.sql.AnalysisException +Recursive reference r cannot be used here. This can be caused by using it on inner side of an outer join, using it with aggregate or distinct, using it in a subquery or using it multiple times in a recursive term (except for using it on different sides of an UNION ALL).; + + +-- !query 46 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data FROM r WHERE level < 10 +) +SELECT COUNT(*) FROM r +-- !query 46 schema +struct<count(1):bigint> +-- !query 46 output +11 + + +-- !query 47 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT DISTINCT level + 1, data FROM r WHERE level < 10 +) +SELECT * FROM r +-- !query 47 schema +struct<> +-- !query 47 output +org.apache.spark.sql.AnalysisException +Recursive reference r cannot be used here. This can be caused by using it on inner side of an outer join, using it with aggregate or distinct, using it in a subquery or using it multiple times in a recursive term (except for using it on different sides of an UNION ALL).; + + +-- !query 48 +WITH RECURSIVE y AS ( + VALUES (1) AS t(id) +), +x AS ( + SELECT * FROM y + UNION ALL + SELECT id + 1 FROM x WHERE id < 5 +) +SELECT * FROM x +-- !query 48 schema +struct<id:int> +-- !query 48 output +1 +2 +3 +4 +5 + + +-- !query 49 +WITH RECURSIVE x AS ( + VALUES (1) AS t(id) + UNION ALL + SELECT id + 1 FROM x WHERE id < 5 +), +y AS ( + VALUES (1) AS t(id) + UNION ALL + SELECT id + 1 FROM y WHERE id < 10 +) +SELECT * FROM y LEFT JOIN x ON x.id = y.id +-- !query 49 schema +struct<id:int,id:int> +-- !query 49 output +1 1 +10 NULL +2 2 +3 3 +4 4 +5 5 +6 NULL +7 NULL +8 NULL +9 NULL + + +-- !query 50 +WITH RECURSIVE x AS ( + VALUES (1) AS t(id) + UNION ALL + SELECT id + 1 FROM x WHERE id < 5 +), +y AS ( + VALUES (1) AS t(id) + UNION ALL + SELECT id + 1 FROM x WHERE id < 10 +) +SELECT * FROM y LEFT JOIN x ON x.id = y.id +-- !query 50 schema +struct<id:int,id:int> +-- !query 50 output +1 1 +2 2 +3 3 +4 4 +5 5 +6 NULL + + +-- !query 51 +WITH RECURSIVE x AS ( + SELECT 1 AS id + UNION ALL + SELECT id + 1 FROM x WHERE id < 3 +), +y AS ( + SELECT * FROM x + UNION ALL + SELECT * FROM x +), +z AS ( + SELECT * FROM x + UNION ALL + SELECT id + 1 FROM z WHERE id < 10 +) +SELECT * FROM z +-- !query 51 schema +struct<id:int> +-- !query 51 output +1 +10 +10 +10 +2 +2 +3 +3 +3 +4 +4 +4 +5 +5 +5 +6 +6 +6 +7 +7 +7 +8 +8 +8 +9 +9 +9 + + +-- !query 52 +WITH RECURSIVE x AS ( + SELECT 1 AS id + UNION ALL + SELECT id + 1 FROM x WHERE id < 3 +), +y AS ( + SELECT * FROM x + UNION ALL + SELECT * FROM x +), +z AS ( + SELECT * FROM y + UNION ALL + SELECT id + 1 FROM z WHERE id < 10 +) +SELECT * FROM z +-- !query 52 schema +struct<id:int> +-- !query 52 output +1 +1 +10 +10 +10 +10 +10 +10 +2 +2 +2 +2 +3 +3 +3 +3 +3 +3 +4 +4 +4 +4 +4 +4 +5 +5 +5 +5 +5 +5 +6 +6 +6 +6 +6 +6 +7 +7 +7 +7 +7 +7 +8 +8 +8 +8 +8 +8 +9 +9 +9 +9 +9 +9 + + +-- !query 53 +WITH t AS ( + WITH RECURSIVE s AS ( + VALUES (1) AS t(i) + UNION ALL + SELECT i + 1 FROM s + ) + SELECT i AS j FROM s LIMIT 10 +) +SELECT * FROM t +-- !query 53 schema +struct<j:int> +-- !query 53 output +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 54 +WITH RECURSIVE outermost AS ( + WITH innermost AS ( + SELECT * FROM outermost + ) + SELECT level + 1 FROM innermost WHERE level < 5 + UNION ALL + SELECT 0 AS level +) +SELECT * FROM outermost Review comment: I think we need to clearly define which syntax is accepted or not, first. Have you checked a statement definition for WITH RECURSIVE in the ANSI/SQL standard? Then, could you define it in `SqlBase.sql`? For example; ``` ctes : WITH namedQuery (',' namedQuery)* : WITH RECURSIVE name=errorCapturingIdentifier (columnAliases=identifierList)? AS? '(' recursiveQuery ')' ; recursiveQuery : initStmt=queryPrimary UNION ALL? recursiveStmt=queryPrimary ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org