[ https://issues.apache.org/jira/browse/SPARK-20700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-20700: ------------------------------- Summary: InferFiltersFromConstraints stackoverflows for query (v2) (was: Expression canonicalization hits stack overflow for query) > InferFiltersFromConstraints stackoverflows for query (v2) > --------------------------------------------------------- > > Key: SPARK-20700 > URL: https://issues.apache.org/jira/browse/SPARK-20700 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL > Affects Versions: 2.2.0 > Reporter: Josh Rosen > > The following (complicated) query eventually fails with a stack overflow > during optimization: > {code} > CREATE TEMPORARY VIEW table_5(varchar0002_col_1, smallint_col_2, float_col_3, > int_col_4, string_col_5, timestamp_col_6, string_col_7) AS VALUES > ('68', CAST(NULL AS SMALLINT), CAST(244.90413 AS FLOAT), -137, '571', > TIMESTAMP('2015-01-14 00:00:00.0'), '947'), > ('82', CAST(213 AS SMALLINT), CAST(53.184647 AS FLOAT), -724, '-278', > TIMESTAMP('1999-08-15 00:00:00.0'), '437'), > ('-7', CAST(-15 AS SMALLINT), CAST(NULL AS FLOAT), -890, '778', > TIMESTAMP('1991-05-23 00:00:00.0'), '630'), > ('22', CAST(676 AS SMALLINT), CAST(385.27386 AS FLOAT), CAST(NULL AS INT), > '-10', TIMESTAMP('1996-09-29 00:00:00.0'), '641'), > ('16', CAST(430 AS SMALLINT), CAST(187.23717 AS FLOAT), 989, CAST(NULL AS > STRING), TIMESTAMP('2024-04-21 00:00:00.0'), '-234'), > ('83', CAST(760 AS SMALLINT), CAST(-695.45386 AS FLOAT), -970, '330', > CAST(NULL AS TIMESTAMP), '-740'), > ('68', CAST(-930 AS SMALLINT), CAST(NULL AS FLOAT), -915, '-766', CAST(NULL > AS TIMESTAMP), CAST(NULL AS STRING)), > ('48', CAST(692 AS SMALLINT), CAST(-220.59615 AS FLOAT), 940, '-514', > CAST(NULL AS TIMESTAMP), '181'), > ('21', CAST(44 AS SMALLINT), CAST(NULL AS FLOAT), -175, '761', > TIMESTAMP('2016-06-30 00:00:00.0'), '487'), > ('50', CAST(953 AS SMALLINT), CAST(837.2948 AS FLOAT), 705, CAST(NULL AS > STRING), CAST(NULL AS TIMESTAMP), '-62'); > CREATE VIEW bools(a, b) as values (1, true), (1, true), (1, null); > SELECT > AVG(-13) OVER (ORDER BY COUNT(t1.smallint_col_2) DESC ROWS 27 PRECEDING ) AS > float_col, > COUNT(t1.smallint_col_2) AS int_col > FROM table_5 t1 > INNER JOIN ( > SELECT > (MIN(-83) OVER (PARTITION BY t2.a ORDER BY t2.a, (t1.int_col_4) * > (t1.int_col_4) ROWS BETWEEN CURRENT ROW AND 15 FOLLOWING)) NOT IN (-222, 928) > AS boolean_col, > t2.a, > (t1.int_col_4) * (t1.int_col_4) AS int_col > FROM table_5 t1 > LEFT JOIN bools t2 ON (t2.a) = (t1.int_col_4) > WHERE > (t1.smallint_col_2) > (t1.smallint_col_2) > GROUP BY > t2.a, > (t1.int_col_4) * (t1.int_col_4) > HAVING > ((t1.int_col_4) * (t1.int_col_4)) IN ((t1.int_col_4) * (t1.int_col_4), > SUM(t1.int_col_4)) > ) t2 ON (((t2.int_col) = (t1.int_col_4)) AND ((t2.a) = (t1.int_col_4))) AND > ((t2.a) = (t1.smallint_col_2)); > {code} > (I haven't tried to minimize this failing case yet). > Based on sampled jstacks from the driver, it looks like the query might be > repeatedly inferring filters from constraints and then pruning those filters. > Here's part of the stack at the point where it stackoverflows: > {code} > [... repeats ...] > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$.orderCommutative(Canonicalize.scala:58) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$.expressionReorder(Canonicalize.scala:63) > at > org.apache.spark.sql.catalyst.expressions.Canonicalize$.execute(Canonicalize.scala:36) > at > org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:158) > - locked <0x00000007a298b940> (a > org.apache.spark.sql.catalyst.expressions.Multiply) > at > org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:156) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:157) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:157) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > [...] > {code} > I suspect this is similar to SPARK-17733, another bug where > {{InferFiltersFromConstraints}}, so I'll cc [~jiangxb1987] and [~sameerag] > who worked on that earlier fix. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org