[jira] [Commented] (SPARK-40278) Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed
[ https://issues.apache.org/jira/browse/SPARK-40278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691877#comment-17691877 ] Yang Jie commented on SPARK-40278: -- OK, I change the status to `RESOLVED` > Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed > -- > > Key: SPARK-40278 > URL: https://issues.apache.org/jira/browse/SPARK-40278 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > I used databricks spark-sql-pref + Spark 3.3 to run 3TB TPCDS q24a or q24b, > the test code as follows: > {code:java} > val rootDir = "hdfs://${clusterName}/tpcds-data/POCGenData3T" > val databaseName = "tpcds_database" > val scaleFactor = "3072" > val format = "parquet" > import com.databricks.spark.sql.perf.tpcds.TPCDSTables > val tables = new TPCDSTables( > spark.sqlContext,dsdgenDir = "./tpcds-kit/tools", > scaleFactor = scaleFactor, > useDoubleForDecimal = false,useStringForDate = false) > spark.sql(s"create database $databaseName") > tables.createTemporaryTables(rootDir, format) > spark.sql(s"use $databaseName")// TPCDS 24a or 24b > val result = spark.sql(""" with ssales as > (select c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, > i_current_price, i_manager_id, i_units, i_size, sum(ss_net_paid) > netpaid > from store_sales, store_returns, store, item, customer, customer_address > where ss_ticket_number = sr_ticket_number > and ss_item_sk = sr_item_sk > and ss_customer_sk = c_customer_sk > and ss_item_sk = i_item_sk > and ss_store_sk = s_store_sk > and c_birth_country = upper(ca_country) > and s_zip = ca_zip > and s_market_id = 8 > group by c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, > i_current_price, i_manager_id, i_units, i_size) > select c_last_name, c_first_name, s_store_name, sum(netpaid) paid > from ssales > where i_color = 'pale' > group by c_last_name, c_first_name, s_store_name > having sum(netpaid) > (select 0.05*avg(netpaid) from ssales)""").collect() > sc.stop() {code} > The above test may failed due to `Stage cancelled because SparkContext was > shut down` of stage 31 and stage 36 when AQE enabled as follows: > > !image-2022-08-30-21-09-48-763.png! > !image-2022-08-30-21-10-24-862.png! > !image-2022-08-30-21-10-57-128.png! > > The DAG corresponding to sql is as follows: > !image-2022-08-30-21-11-50-895.png! > The details as follows: > > > {code:java} > == Physical Plan == > AdaptiveSparkPlan (42) > +- == Final Plan == >LocalTableScan (1) > +- == Initial Plan == >Filter (41) >+- HashAggregate (40) > +- Exchange (39) > +- HashAggregate (38) > +- HashAggregate (37) >+- Exchange (36) > +- HashAggregate (35) > +- Project (34) > +- BroadcastHashJoin Inner BuildRight (33) >:- Project (29) >: +- BroadcastHashJoin Inner BuildRight (28) >: :- Project (24) >: : +- BroadcastHashJoin Inner BuildRight (23) >: : :- Project (19) >: : : +- BroadcastHashJoin Inner > BuildRight (18) >: : : :- Project (13) >: : : : +- SortMergeJoin Inner (12) >: : : : :- Sort (6) >: : : : : +- Exchange (5) >: : : : : +- Project (4) >: : : : :+- Filter (3) >: : : : : +- Scan > parquet (2) >: : : : +- Sort (11) >: : : :+- Exchange (10) >: : : : +- Project (9) >: : : : +- Filter (8) >: : : : +- Scan > parquet (7) >: : : +- BroadcastExchange (17) >: : :+- Project (16) >: : : +- Filter (15) >: : : +- Scan parquet (14) >: : +- BroadcastExchange (22) >: :+- Filter (21) >: : +- Scan parquet (20) >:
[jira] [Commented] (SPARK-40278) Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed
[ https://issues.apache.org/jira/browse/SPARK-40278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691861#comment-17691861 ] XiDuo You commented on SPARK-40278: --- It should work now(3.4). We figure out SQL execution status by tracking event `SparkListenerSQLExecutionEnd`. see https://issues.apache.org/jira/browse/SPARK-40834 > Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed > -- > > Key: SPARK-40278 > URL: https://issues.apache.org/jira/browse/SPARK-40278 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > I used databricks spark-sql-pref + Spark 3.3 to run 3TB TPCDS q24a or q24b, > the test code as follows: > {code:java} > val rootDir = "hdfs://${clusterName}/tpcds-data/POCGenData3T" > val databaseName = "tpcds_database" > val scaleFactor = "3072" > val format = "parquet" > import com.databricks.spark.sql.perf.tpcds.TPCDSTables > val tables = new TPCDSTables( > spark.sqlContext,dsdgenDir = "./tpcds-kit/tools", > scaleFactor = scaleFactor, > useDoubleForDecimal = false,useStringForDate = false) > spark.sql(s"create database $databaseName") > tables.createTemporaryTables(rootDir, format) > spark.sql(s"use $databaseName")// TPCDS 24a or 24b > val result = spark.sql(""" with ssales as > (select c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, > i_current_price, i_manager_id, i_units, i_size, sum(ss_net_paid) > netpaid > from store_sales, store_returns, store, item, customer, customer_address > where ss_ticket_number = sr_ticket_number > and ss_item_sk = sr_item_sk > and ss_customer_sk = c_customer_sk > and ss_item_sk = i_item_sk > and ss_store_sk = s_store_sk > and c_birth_country = upper(ca_country) > and s_zip = ca_zip > and s_market_id = 8 > group by c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, > i_current_price, i_manager_id, i_units, i_size) > select c_last_name, c_first_name, s_store_name, sum(netpaid) paid > from ssales > where i_color = 'pale' > group by c_last_name, c_first_name, s_store_name > having sum(netpaid) > (select 0.05*avg(netpaid) from ssales)""").collect() > sc.stop() {code} > The above test may failed due to `Stage cancelled because SparkContext was > shut down` of stage 31 and stage 36 when AQE enabled as follows: > > !image-2022-08-30-21-09-48-763.png! > !image-2022-08-30-21-10-24-862.png! > !image-2022-08-30-21-10-57-128.png! > > The DAG corresponding to sql is as follows: > !image-2022-08-30-21-11-50-895.png! > The details as follows: > > > {code:java} > == Physical Plan == > AdaptiveSparkPlan (42) > +- == Final Plan == >LocalTableScan (1) > +- == Initial Plan == >Filter (41) >+- HashAggregate (40) > +- Exchange (39) > +- HashAggregate (38) > +- HashAggregate (37) >+- Exchange (36) > +- HashAggregate (35) > +- Project (34) > +- BroadcastHashJoin Inner BuildRight (33) >:- Project (29) >: +- BroadcastHashJoin Inner BuildRight (28) >: :- Project (24) >: : +- BroadcastHashJoin Inner BuildRight (23) >: : :- Project (19) >: : : +- BroadcastHashJoin Inner > BuildRight (18) >: : : :- Project (13) >: : : : +- SortMergeJoin Inner (12) >: : : : :- Sort (6) >: : : : : +- Exchange (5) >: : : : : +- Project (4) >: : : : :+- Filter (3) >: : : : : +- Scan > parquet (2) >: : : : +- Sort (11) >: : : :+- Exchange (10) >: : : : +- Project (9) >: : : : +- Filter (8) >: : : : +- Scan > parquet (7) >: : : +- BroadcastExchange (17) >: : :+- Project (16) >: : : +- Filter (15) >: : : +- Scan parquet (14) >: : +- BroadcastExchange (22) >:
[jira] [Commented] (SPARK-40278) Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed
[ https://issues.apache.org/jira/browse/SPARK-40278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691395#comment-17691395 ] Yang Jie commented on SPARK-40278: -- SQL not failed, UI may break. [~ulysses] explained this at [https://github.com/apache/spark/pull/35149#issuecomment-1231712806] and he should have tried to fix the issue, but I'm not sure whether it has been fixed > Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed > -- > > Key: SPARK-40278 > URL: https://issues.apache.org/jira/browse/SPARK-40278 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > I used databricks spark-sql-pref + Spark 3.3 to run 3TB TPCDS q24a or q24b, > the test code as follows: > {code:java} > val rootDir = "hdfs://${clusterName}/tpcds-data/POCGenData3T" > val databaseName = "tpcds_database" > val scaleFactor = "3072" > val format = "parquet" > import com.databricks.spark.sql.perf.tpcds.TPCDSTables > val tables = new TPCDSTables( > spark.sqlContext,dsdgenDir = "./tpcds-kit/tools", > scaleFactor = scaleFactor, > useDoubleForDecimal = false,useStringForDate = false) > spark.sql(s"create database $databaseName") > tables.createTemporaryTables(rootDir, format) > spark.sql(s"use $databaseName")// TPCDS 24a or 24b > val result = spark.sql(""" with ssales as > (select c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, > i_current_price, i_manager_id, i_units, i_size, sum(ss_net_paid) > netpaid > from store_sales, store_returns, store, item, customer, customer_address > where ss_ticket_number = sr_ticket_number > and ss_item_sk = sr_item_sk > and ss_customer_sk = c_customer_sk > and ss_item_sk = i_item_sk > and ss_store_sk = s_store_sk > and c_birth_country = upper(ca_country) > and s_zip = ca_zip > and s_market_id = 8 > group by c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, > i_current_price, i_manager_id, i_units, i_size) > select c_last_name, c_first_name, s_store_name, sum(netpaid) paid > from ssales > where i_color = 'pale' > group by c_last_name, c_first_name, s_store_name > having sum(netpaid) > (select 0.05*avg(netpaid) from ssales)""").collect() > sc.stop() {code} > The above test may failed due to `Stage cancelled because SparkContext was > shut down` of stage 31 and stage 36 when AQE enabled as follows: > > !image-2022-08-30-21-09-48-763.png! > !image-2022-08-30-21-10-24-862.png! > !image-2022-08-30-21-10-57-128.png! > > The DAG corresponding to sql is as follows: > !image-2022-08-30-21-11-50-895.png! > The details as follows: > > > {code:java} > == Physical Plan == > AdaptiveSparkPlan (42) > +- == Final Plan == >LocalTableScan (1) > +- == Initial Plan == >Filter (41) >+- HashAggregate (40) > +- Exchange (39) > +- HashAggregate (38) > +- HashAggregate (37) >+- Exchange (36) > +- HashAggregate (35) > +- Project (34) > +- BroadcastHashJoin Inner BuildRight (33) >:- Project (29) >: +- BroadcastHashJoin Inner BuildRight (28) >: :- Project (24) >: : +- BroadcastHashJoin Inner BuildRight (23) >: : :- Project (19) >: : : +- BroadcastHashJoin Inner > BuildRight (18) >: : : :- Project (13) >: : : : +- SortMergeJoin Inner (12) >: : : : :- Sort (6) >: : : : : +- Exchange (5) >: : : : : +- Project (4) >: : : : :+- Filter (3) >: : : : : +- Scan > parquet (2) >: : : : +- Sort (11) >: : : :+- Exchange (10) >: : : : +- Project (9) >: : : : +- Filter (8) >: : : : +- Scan > parquet (7) >: : : +- BroadcastExchange (17) >: : :+- Project (16) >: : : +- Filter (15) >: : : +- Scan parquet (14) >: :
[jira] [Commented] (SPARK-40278) Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed
[ https://issues.apache.org/jira/browse/SPARK-40278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691388#comment-17691388 ] Yuming Wang commented on SPARK-40278: - [~LuciferYang] Is this issue still exist? > Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed > -- > > Key: SPARK-40278 > URL: https://issues.apache.org/jira/browse/SPARK-40278 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > I used databricks spark-sql-pref + Spark 3.3 to run 3TB TPCDS q24a or q24b, > the test code as follows: > {code:java} > val rootDir = "hdfs://${clusterName}/tpcds-data/POCGenData3T" > val databaseName = "tpcds_database" > val scaleFactor = "3072" > val format = "parquet" > import com.databricks.spark.sql.perf.tpcds.TPCDSTables > val tables = new TPCDSTables( > spark.sqlContext,dsdgenDir = "./tpcds-kit/tools", > scaleFactor = scaleFactor, > useDoubleForDecimal = false,useStringForDate = false) > spark.sql(s"create database $databaseName") > tables.createTemporaryTables(rootDir, format) > spark.sql(s"use $databaseName")// TPCDS 24a or 24b > val result = spark.sql(""" with ssales as > (select c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, > i_current_price, i_manager_id, i_units, i_size, sum(ss_net_paid) > netpaid > from store_sales, store_returns, store, item, customer, customer_address > where ss_ticket_number = sr_ticket_number > and ss_item_sk = sr_item_sk > and ss_customer_sk = c_customer_sk > and ss_item_sk = i_item_sk > and ss_store_sk = s_store_sk > and c_birth_country = upper(ca_country) > and s_zip = ca_zip > and s_market_id = 8 > group by c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color, > i_current_price, i_manager_id, i_units, i_size) > select c_last_name, c_first_name, s_store_name, sum(netpaid) paid > from ssales > where i_color = 'pale' > group by c_last_name, c_first_name, s_store_name > having sum(netpaid) > (select 0.05*avg(netpaid) from ssales)""").collect() > sc.stop() {code} > The above test may failed due to `Stage cancelled because SparkContext was > shut down` of stage 31 and stage 36 when AQE enabled as follows: > > !image-2022-08-30-21-09-48-763.png! > !image-2022-08-30-21-10-24-862.png! > !image-2022-08-30-21-10-57-128.png! > > The DAG corresponding to sql is as follows: > !image-2022-08-30-21-11-50-895.png! > The details as follows: > > > {code:java} > == Physical Plan == > AdaptiveSparkPlan (42) > +- == Final Plan == >LocalTableScan (1) > +- == Initial Plan == >Filter (41) >+- HashAggregate (40) > +- Exchange (39) > +- HashAggregate (38) > +- HashAggregate (37) >+- Exchange (36) > +- HashAggregate (35) > +- Project (34) > +- BroadcastHashJoin Inner BuildRight (33) >:- Project (29) >: +- BroadcastHashJoin Inner BuildRight (28) >: :- Project (24) >: : +- BroadcastHashJoin Inner BuildRight (23) >: : :- Project (19) >: : : +- BroadcastHashJoin Inner > BuildRight (18) >: : : :- Project (13) >: : : : +- SortMergeJoin Inner (12) >: : : : :- Sort (6) >: : : : : +- Exchange (5) >: : : : : +- Project (4) >: : : : :+- Filter (3) >: : : : : +- Scan > parquet (2) >: : : : +- Sort (11) >: : : :+- Exchange (10) >: : : : +- Project (9) >: : : : +- Filter (8) >: : : : +- Scan > parquet (7) >: : : +- BroadcastExchange (17) >: : :+- Project (16) >: : : +- Filter (15) >: : : +- Scan parquet (14) >: : +- BroadcastExchange (22) >: :+- Filter (21) >: : +- Scan parquet (20) >