Josh Rosen created SPARK-20700: ---------------------------------- Summary: Expression canonicalization hits stack overflow for query Key: SPARK-20700 URL: https://issues.apache.org/jira/browse/SPARK-20700 Project: Spark Issue Type: Bug Components: Optimizer, SQL Affects Versions: 2.2.0 Reporter: Josh Rosen
The following (complicated) query eventually fails with a stack overflow during optimization: {code} CREATE TEMPORARY VIEW table_5(varchar0002_col_1, smallint_col_2, float_col_3, int_col_4, string_col_5, timestamp_col_6, string_col_7) AS VALUES ('68', CAST(NULL AS SMALLINT), CAST(244.90413 AS FLOAT), -137, '571', TIMESTAMP('2015-01-14 00:00:00.0'), '947'), ('82', CAST(213 AS SMALLINT), CAST(53.184647 AS FLOAT), -724, '-278', TIMESTAMP('1999-08-15 00:00:00.0'), '437'), ('-7', CAST(-15 AS SMALLINT), CAST(NULL AS FLOAT), -890, '778', TIMESTAMP('1991-05-23 00:00:00.0'), '630'), ('22', CAST(676 AS SMALLINT), CAST(385.27386 AS FLOAT), CAST(NULL AS INT), '-10', TIMESTAMP('1996-09-29 00:00:00.0'), '641'), ('16', CAST(430 AS SMALLINT), CAST(187.23717 AS FLOAT), 989, CAST(NULL AS STRING), TIMESTAMP('2024-04-21 00:00:00.0'), '-234'), ('83', CAST(760 AS SMALLINT), CAST(-695.45386 AS FLOAT), -970, '330', CAST(NULL AS TIMESTAMP), '-740'), ('68', CAST(-930 AS SMALLINT), CAST(NULL AS FLOAT), -915, '-766', CAST(NULL AS TIMESTAMP), CAST(NULL AS STRING)), ('48', CAST(692 AS SMALLINT), CAST(-220.59615 AS FLOAT), 940, '-514', CAST(NULL AS TIMESTAMP), '181'), ('21', CAST(44 AS SMALLINT), CAST(NULL AS FLOAT), -175, '761', TIMESTAMP('2016-06-30 00:00:00.0'), '487'), ('50', CAST(953 AS SMALLINT), CAST(837.2948 AS FLOAT), 705, CAST(NULL AS STRING), CAST(NULL AS TIMESTAMP), '-62'); CREATE VIEW bools(a, b) as values (1, true), (1, true), (1, null); SELECT AVG(-13) OVER (ORDER BY COUNT(t1.smallint_col_2) DESC ROWS 27 PRECEDING ) AS float_col, COUNT(t1.smallint_col_2) AS int_col FROM table_5 t1 INNER JOIN ( SELECT (MIN(-83) OVER (PARTITION BY t2.a ORDER BY t2.a, (t1.int_col_4) * (t1.int_col_4) ROWS BETWEEN CURRENT ROW AND 15 FOLLOWING)) NOT IN (-222, 928) AS boolean_col, t2.a, (t1.int_col_4) * (t1.int_col_4) AS int_col FROM table_5 t1 LEFT JOIN bools t2 ON (t2.a) = (t1.int_col_4) WHERE (t1.smallint_col_2) > (t1.smallint_col_2) GROUP BY t2.a, (t1.int_col_4) * (t1.int_col_4) HAVING ((t1.int_col_4) * (t1.int_col_4)) IN ((t1.int_col_4) * (t1.int_col_4), SUM(t1.int_col_4)) ) t2 ON (((t2.int_col) = (t1.int_col_4)) AND ((t2.a) = (t1.int_col_4))) AND ((t2.a) = (t1.smallint_col_2)); {code} (I haven't tried to minimize this failing case yet). Based on sampled jstacks from the driver, it looks like the query might be repeatedly inferring filters from constraints and then pruning those filters. Here's part of the stack at the point where it stackoverflows: {code} [... repeats ...] at org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) at org.apache.spark.sql.catalyst.expressions.Canonicalize$.orderCommutative(Canonicalize.scala:58) at org.apache.spark.sql.catalyst.expressions.Canonicalize$.expressionReorder(Canonicalize.scala:63) at org.apache.spark.sql.catalyst.expressions.Canonicalize$.execute(Canonicalize.scala:36) at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:158) - locked <0x00000007a298b940> (a org.apache.spark.sql.catalyst.expressions.Multiply) at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:156) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:157) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:157) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) [...] {code} I suspect this is similar to SPARK-17733, another bug where {{InferFiltersFromConstraints}}, so I'll cc [~jiangxb1987] and [~sameerag] who worked on that earlier fix. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org