Laurens Versluis created SPARK-48194: ----------------------------------------
Summary: OOM exception / infinite loop when having multiple alias column projections Key: SPARK-48194 URL: https://issues.apache.org/jira/browse/SPARK-48194 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 3.5.1, 3.2.1 Environment: Tested using PySpark with Spark 3.2.1 and 3.5.1 using Python 3.11 and using Java Spark using Java 11 with Spark 3.2.1. Exceptions occur both on Unix and Windows (11) Reporter: Laurens Versluis Spark 3.2.1 throws a `java.lang.OutOfMemoryError: Java heap space` exception and Spark 3.5.1 seems to run into an infinite loop (but not cause an OOM!) when a query is fired that uses multiple times the same column under different aliases when combined with a JOIN and Common Table Expressions. We have a service in production that generates SQL that operates per "KPI" and another component that combines the relevant parquet datasets and combines them. I have extraced from the logs a Minimal Reproducible Example. Weirdly, it seems related to both the JOIN and the column names used in the queries. Renaming the columns or replacing the JOIN with a UNION does not give me the error. The setup is as follows: {code:java} df = spark.createDataFrame([(0.374, -28.039, True, True)], ['mtdcorrected_overlay_x', 'mtdcorrected_overlay_y', 'target_valid_x', 'target_valid_y']) df.createOrReplaceTempView("mtd1")df2 = spark.createDataFrame([(4.0, 0.0, True, True)], ['mtdcorrected_overlay_x', 'mtdcorrected_overlay_y', 'target_valid_x', 'target_valid_y']) df2.createOrReplaceTempView("mtd2"){code} Next, we JOIN the two datasets: {code:java} df3 = spark.sql(r""" SELECT `target_valid_y` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `mtdcorrected_overlay_x` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `target_valid_x` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` FROM mtd1 """) df4 = spark.sql(r""" SELECT `target_valid_y` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `mtdcorrected_overlay_x` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `target_valid_x` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` FROM mtd2 """) df3.join(df4, [ "COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base", "COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base", "COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base", "COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base", "COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base", "COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid", "COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid", "COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid", "COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid", "COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid", "COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base", "COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base", "COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base", "COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base", "COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base", "COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid", "COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid", "COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid", "COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid", "COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid" ], "fullouter").distinct().createOrReplaceTempView("TEMP_VIEW") {code} And finally, we fire the query: {code:java} spark.sql(r""" WITH kpiNullFilterView AS (WITH validityColumnsView0 AS (SELECT `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` FROM `TEMP_VIEW` WHERE ( `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL OR `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL) AND ( `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL AND `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE OR `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL AND `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL AND `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL AND `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE)) SELECT MIN(CASE WHEN `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE THEN `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` ELSE NULL END) `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, MIN(CASE WHEN `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE THEN `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` ELSE NULL END) `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` ELSE NULL END, 2.5E-1) `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` ELSE NULL END, 2.5E-1) `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` ELSE NULL END, 5E-1) `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` ELSE NULL END, 5E-1) `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` ELSE NULL END, 7.5E-1) `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` ELSE NULL END, 7.5E-1) `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, MAX(CASE WHEN `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE THEN `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` ELSE NULL END) `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, MAX(CASE WHEN `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE THEN `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` ELSE NULL END) `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` FROM validityColumnsView0) SELECT `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` FROM kpiNullFilterView WHERE `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL""").show(1,False, True) {code} If you analyze the JOIN and statistics query, you notice that it actually only uses 4 distinct column and a bunch of aliases. If you replace all aliases with the original column, you get essentially the same query but this one returns nearly instantly with no issues whatsoever: (Update the logic above to not use aliases, but still use TEMP_VIEW as name) {code:java} spark.sql(r""" WITH validityColumnsView0 AS (SELECT `mtdcorrected_overlay_x`, `mtdcorrected_overlay_y`, `target_valid_x`, `target_valid_y` FROM `TEMP_VIEW` WHERE ( `mtdcorrected_overlay_x` IS NOT NULL OR `mtdcorrected_overlay_y` IS NOT NULL) AND ( `mtdcorrected_overlay_x` IS NOT NULL AND `target_valid_x` <> FALSE OR `mtdcorrected_overlay_y` IS NOT NULL AND `target_valid_y` <> FALSE)), kpiNullFilterView AS (SELECT MIN(CASE WHEN `target_valid_x` <> FALSE THEN `mtdcorrected_overlay_x` END) `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, MIN(CASE WHEN `target_valid_y` <> FALSE THEN `mtdcorrected_overlay_y` END) `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_x` <> FALSE THEN `mtdcorrected_overlay_x` END, 2.5E-1) `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_y` <> FALSE THEN `mtdcorrected_overlay_y` END, 2.5E-1) `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_x` <> FALSE THEN `mtdcorrected_overlay_x` END, 5E-1) `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_y` <> FALSE THEN `mtdcorrected_overlay_y` END, 5E-1) `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_x` <> FALSE THEN `mtdcorrected_overlay_x` END, 7.5E-1) `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_y` <> FALSE THEN `mtdcorrected_overlay_y` END, 7.5E-1) `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, MAX(CASE WHEN `target_valid_x` <> FALSE THEN `mtdcorrected_overlay_x` END) `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, MAX(CASE WHEN `target_valid_y` <> FALSE THEN `mtdcorrected_overlay_y` END) `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` FROM validityColumnsView0) SELECT `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` FROM kpiNullFilterView WHERE `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL""").show(1,False, True) {code} Even with the 2x a 1-row dataset I get the OOM on Spark 3.2.1. The real production dataset is significant bigger, so if needed, create 100k-1M row datasets as needed. I hope this is sufficient to reproduce the issue. If anything else more is needed, leave a reply. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org