Josh Rosen created SPARK-20700:
----------------------------------

             Summary: Expression canonicalization hits stack overflow for query
                 Key: SPARK-20700
                 URL: https://issues.apache.org/jira/browse/SPARK-20700
             Project: Spark
          Issue Type: Bug
          Components: Optimizer, SQL
    Affects Versions: 2.2.0
            Reporter: Josh Rosen


The following (complicated) query eventually fails with a stack overflow during 
optimization:

{code}
CREATE TEMPORARY VIEW table_5(varchar0002_col_1, smallint_col_2, float_col_3, 
int_col_4, string_col_5, timestamp_col_6, string_col_7) AS VALUES
  ('68', CAST(NULL AS SMALLINT), CAST(244.90413 AS FLOAT), -137, '571', 
TIMESTAMP('2015-01-14 00:00:00.0'), '947'),
  ('82', CAST(213 AS SMALLINT), CAST(53.184647 AS FLOAT), -724, '-278', 
TIMESTAMP('1999-08-15 00:00:00.0'), '437'),
  ('-7', CAST(-15 AS SMALLINT), CAST(NULL AS FLOAT), -890, '778', 
TIMESTAMP('1991-05-23 00:00:00.0'), '630'),
  ('22', CAST(676 AS SMALLINT), CAST(385.27386 AS FLOAT), CAST(NULL AS INT), 
'-10', TIMESTAMP('1996-09-29 00:00:00.0'), '641'),
  ('16', CAST(430 AS SMALLINT), CAST(187.23717 AS FLOAT), 989, CAST(NULL AS 
STRING), TIMESTAMP('2024-04-21 00:00:00.0'), '-234'),
  ('83', CAST(760 AS SMALLINT), CAST(-695.45386 AS FLOAT), -970, '330', 
CAST(NULL AS TIMESTAMP), '-740'),
  ('68', CAST(-930 AS SMALLINT), CAST(NULL AS FLOAT), -915, '-766', CAST(NULL 
AS TIMESTAMP), CAST(NULL AS STRING)),
  ('48', CAST(692 AS SMALLINT), CAST(-220.59615 AS FLOAT), 940, '-514', 
CAST(NULL AS TIMESTAMP), '181'),
  ('21', CAST(44 AS SMALLINT), CAST(NULL AS FLOAT), -175, '761', 
TIMESTAMP('2016-06-30 00:00:00.0'), '487'),
  ('50', CAST(953 AS SMALLINT), CAST(837.2948 AS FLOAT), 705, CAST(NULL AS 
STRING), CAST(NULL AS TIMESTAMP), '-62');

CREATE VIEW bools(a, b) as values (1, true), (1, true), (1, null);

SELECT
AVG(-13) OVER (ORDER BY COUNT(t1.smallint_col_2) DESC ROWS 27 PRECEDING ) AS 
float_col,
COUNT(t1.smallint_col_2) AS int_col
FROM table_5 t1
INNER JOIN (
SELECT
(MIN(-83) OVER (PARTITION BY t2.a ORDER BY t2.a, (t1.int_col_4) * 
(t1.int_col_4) ROWS BETWEEN CURRENT ROW AND 15 FOLLOWING)) NOT IN (-222, 928) 
AS boolean_col,
t2.a,
(t1.int_col_4) * (t1.int_col_4) AS int_col
FROM table_5 t1
LEFT JOIN bools t2 ON (t2.a) = (t1.int_col_4)
WHERE
(t1.smallint_col_2) > (t1.smallint_col_2)
GROUP BY
t2.a,
(t1.int_col_4) * (t1.int_col_4)
HAVING
((t1.int_col_4) * (t1.int_col_4)) IN ((t1.int_col_4) * (t1.int_col_4), 
SUM(t1.int_col_4))
) t2 ON (((t2.int_col) = (t1.int_col_4)) AND ((t2.a) = (t1.int_col_4))) AND 
((t2.a) = (t1.smallint_col_2));
{code}

(I haven't tried to minimize this failing case yet).

Based on sampled jstacks from the driver, it looks like the query might be 
repeatedly inferring filters from constraints and then pruning those filters.

Here's part of the stack at the point where it stackoverflows:

{code}
[... repeats ...]
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.immutable.List.flatMap(List.scala:344)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.immutable.List.flatMap(List.scala:344)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.immutable.List.flatMap(List.scala:344)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.immutable.List.flatMap(List.scala:344)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.immutable.List.flatMap(List.scala:344)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.immutable.List.flatMap(List.scala:344)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$.orderCommutative(Canonicalize.scala:58)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$.expressionReorder(Canonicalize.scala:63)
        at 
org.apache.spark.sql.catalyst.expressions.Canonicalize$.execute(Canonicalize.scala:36)
        at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:158)
        - locked <0x00000007a298b940> (a 
org.apache.spark.sql.catalyst.expressions.Multiply)
        at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:156)
        at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:157)
        at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:157)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[...]
{code}

I suspect this is similar to SPARK-17733, another bug where 
{{InferFiltersFromConstraints}}, so I'll cc [~jiangxb1987] and [~sameerag] who 
worked on that earlier fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to