[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: explainMode-cost.zip > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: 1.1.BeforeAnalyzeTable-joinreorder-disabled.txt, > 1.2.BeforeAnalyzeTable-joinreorder-enabled.txt, 2.1.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-disabled.txt, 2.2.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-enabled.txt, > 3.1.AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt, > 3.2.AfterAnalyzeTableForAllColumns-joinreorder-enabled.txt, > explainMode-cost.zip > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: (was: explainMode-cost.zip) > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: 1.1.BeforeAnalyzeTable-joinreorder-disabled.txt, > 1.2.BeforeAnalyzeTable-joinreorder-enabled.txt, 2.1.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-disabled.txt, 2.2.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-enabled.txt, > 3.1.AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt, > 3.2.AfterAnalyzeTableForAllColumns-joinreorder-enabled.txt, > explainMode-cost.zip > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: explainMode-cost.zip > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: 1.1.BeforeAnalyzeTable-joinreorder-disabled.txt, > 1.2.BeforeAnalyzeTable-joinreorder-enabled.txt, 2.1.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-disabled.txt, 2.2.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-enabled.txt, > 3.1.AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt, > 3.2.AfterAnalyzeTableForAllColumns-joinreorder-enabled.txt, > explainMode-cost.zip > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: (was: explainMode-cost.zip) > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: 1.1.BeforeAnalyzeTable-joinreorder-disabled.txt, > 1.2.BeforeAnalyzeTable-joinreorder-enabled.txt, 2.1.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-disabled.txt, 2.2.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-enabled.txt, > 3.1.AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt, > 3.2.AfterAnalyzeTableForAllColumns-joinreorder-enabled.txt, > explainMode-cost.zip > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: explainMode-cost.zip > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: 1.1.BeforeAnalyzeTable-joinreorder-disabled.txt, > 1.2.BeforeAnalyzeTable-joinreorder-enabled.txt, 2.1.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-disabled.txt, 2.2.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-enabled.txt, > 3.1.AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt, > 3.2.AfterAnalyzeTableForAllColumns-joinreorder-enabled.txt, > explainMode-cost.zip > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: 2.2.AfterAnalyzeTable WITHOUT ForAllColumns-joinreorder-enabled.txt 3.1.AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt 3.2.AfterAnalyzeTableForAllColumns-joinreorder-enabled.txt 1.1.BeforeAnalyzeTable-joinreorder-disabled.txt 1.2.BeforeAnalyzeTable-joinreorder-enabled.txt 2.1.AfterAnalyzeTable WITHOUT ForAllColumns-joinreorder-disabled.txt > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: 1.1.BeforeAnalyzeTable-joinreorder-disabled.txt, > 1.2.BeforeAnalyzeTable-joinreorder-enabled.txt, 2.1.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-disabled.txt, 2.2.AfterAnalyzeTable WITHOUT > ForAllColumns-joinreorder-enabled.txt, > 3.1.AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt, > 3.2.AfterAnalyzeTableForAllColumns-joinreorder-enabled.txt > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: (was: BeforeAnalyzeTable.txt) > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: (was: AfterAnalyzeTable WITHOUT ForAllColumns.txt) > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: (was: AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt) > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: (was: AfterAnalyzeTableForAllColumns.txt) > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39971: - Component/s: SQL > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: AfterAnalyzeTable WITHOUT ForAllColumns.txt, > AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt, > AfterAnalyzeTableForAllColumns.txt, BeforeAnalyzeTable.txt > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: (was: AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt) > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: AfterAnalyzeTable WITHOUT ForAllColumns.txt, > AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt, > AfterAnalyzeTableForAllColumns.txt, BeforeAnalyzeTable.txt > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: AfterAnalyzeTable WITHOUT ForAllColumns.txt, > AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt, > AfterAnalyzeTableForAllColumns.txt, BeforeAnalyzeTable.txt > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: AfterAnalyzeTable WITHOUT ForAllColumns.txt, > AfterAnalyzeTableForAllColumns-joinreorder-disabled.txt, > AfterAnalyzeTableForAllColumns.txt, BeforeAnalyzeTable.txt > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Description: I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without the FOR ALL COLUMNS) some queries became really slow. For example query24 - [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes between 10~15min before running the ANALYZE TABLE. After running ANALYZE TABLE I waited 24h before cancelling the execution. If I disable spark.sql.cbo.joinReorder.enabled or spark.sql.cbo.enabled it becomes fast again. It seems something in join reordering is not working well when we have table stats, but not column stats. Rows Count: store_sales - 2879966589 store_returns - 288009578 store - 1002 item - 30 customer - 1200 customer_address - 600 was: I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without the FOR ALL COLUMNS) some queries became really slow. For example query24 - https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql takes between 10~15min before running the ANALYZE TABLE. After running ANALYZE TABLE I waited 24h before cancelling the execution. If I disable spark.sql.cbo.joinReorder.enabled or spark.sql.cbo.enabled it becomes fast again. It seems something in join reordering is not working well when we have table stats, but not column stats. > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: AfterAnalyzeTable WITHOUT ForAllColumns.txt, > AfterAnalyzeTableForAllColumns.txt, BeforeAnalyzeTable.txt > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql] takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. > Rows Count: > store_sales - 2879966589 > store_returns - 288009578 > store - 1002 > item - 30 > customer - 1200 > customer_address - 600 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Attachment: AfterAnalyzeTable WITHOUT ForAllColumns.txt AfterAnalyzeTableForAllColumns.txt BeforeAnalyzeTable.txt > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > Attachments: AfterAnalyzeTable WITHOUT ForAllColumns.txt, > AfterAnalyzeTableForAllColumns.txt, BeforeAnalyzeTable.txt > > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Description: I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without the FOR ALL COLUMNS) some queries became really slow. For example query24 - https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql takes between 10~15min before running the ANALYZE TABLE. After running ANALYZE TABLE I waited 24h before cancelling the execution. If I disable spark.sql.cbo.joinReorder.enabled or spark.sql.cbo.enabled it becomes fast again. It seems something in join reordering is not working well when we have table stats, but not column stats. was: I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without the FOR ALL COLUMNS) some queries became really slow. For example query24 - [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql|https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql,],takes between 10~15min before running the ANALYZE TABLE. After running ANALYZE TABLE I waited 24h before cancelling the execution. If I disable spark.sql.cbo.joinReorder.enabled or spark.sql.cbo.enabled it becomes fast again. It seems something in join reordering is not working well when we have table stats, but not column stats. > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39971) ANALYZE TABLE makes some queries run forever
[ https://issues.apache.org/jira/browse/SPARK-39971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felipe updated SPARK-39971: --- Description: I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without the FOR ALL COLUMNS) some queries became really slow. For example query24 - [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql|https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql,],takes between 10~15min before running the ANALYZE TABLE. After running ANALYZE TABLE I waited 24h before cancelling the execution. If I disable spark.sql.cbo.joinReorder.enabled or spark.sql.cbo.enabled it becomes fast again. It seems something in join reordering is not working well when we have table stats, but not column stats. was: I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without the FOR ALL COLUMNS) some queries became really slow. For example query24 - [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql,] takes between 10~15min before running the ANALYZE TABLE. After running ANALYZE TABLE I waited 24h before cancelling the execution. If I disable spark.sql.cbo.joinReorder.enabled or spark.sql.cbo.enabled it becomes fast again. It seems something in join reordering is not working well when we have table stats, but not column stats. > ANALYZE TABLE makes some queries run forever > > > Key: SPARK-39971 > URL: https://issues.apache.org/jira/browse/SPARK-39971 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.2 >Reporter: Felipe >Priority: Major > > I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without > the FOR ALL COLUMNS) some queries became really slow. For example query24 - > [https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql|https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql,],takes > between 10~15min before running the ANALYZE TABLE. > After running ANALYZE TABLE I waited 24h before cancelling the execution. > If I disable spark.sql.cbo.joinReorder.enabled or > spark.sql.cbo.enabled it becomes fast again. > It seems something in join reordering is not working well when we have table > stats, but not column stats. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org