[jira] [Commented] (SPARK-40278) Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed

2023-02-21 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691877#comment-17691877
 ] 

Yang Jie commented on SPARK-40278:
--

OK, I change the status to `RESOLVED`

> Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed
> --
>
> Key: SPARK-40278
> URL: https://issues.apache.org/jira/browse/SPARK-40278
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> I used databricks spark-sql-pref + Spark 3.3 to run 3TB TPCDS q24a or q24b, 
> the test code as follows:
> {code:java}
> val rootDir = "hdfs://${clusterName}/tpcds-data/POCGenData3T"
> val databaseName = "tpcds_database"
> val scaleFactor = "3072"
> val format = "parquet"
> import com.databricks.spark.sql.perf.tpcds.TPCDSTables
> val tables = new TPCDSTables(
>       spark.sqlContext,dsdgenDir = "./tpcds-kit/tools",
>       scaleFactor = scaleFactor,
>       useDoubleForDecimal = false,useStringForDate = false)
> spark.sql(s"create database $databaseName")
> tables.createTemporaryTables(rootDir, format)
> spark.sql(s"use $databaseName")// TPCDS 24a or 24b
> val result = spark.sql(""" with ssales as
>  (select c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color,
>         i_current_price, i_manager_id, i_units, i_size, sum(ss_net_paid) 
> netpaid
>  from store_sales, store_returns, store, item, customer, customer_address
>  where ss_ticket_number = sr_ticket_number
>    and ss_item_sk = sr_item_sk
>    and ss_customer_sk = c_customer_sk
>    and ss_item_sk = i_item_sk
>    and ss_store_sk = s_store_sk
>    and c_birth_country = upper(ca_country)
>    and s_zip = ca_zip
>  and s_market_id = 8
>  group by c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color,
>           i_current_price, i_manager_id, i_units, i_size)
>  select c_last_name, c_first_name, s_store_name, sum(netpaid) paid
>  from ssales
>  where i_color = 'pale'
>  group by c_last_name, c_first_name, s_store_name
>  having sum(netpaid) > (select 0.05*avg(netpaid) from ssales)""").collect()
>  sc.stop() {code}
> The above test may failed due to `Stage cancelled because SparkContext was 
> shut down` of stage 31 and stage 36 when AQE enabled as follows:
>  
> !image-2022-08-30-21-09-48-763.png!
> !image-2022-08-30-21-10-24-862.png!
> !image-2022-08-30-21-10-57-128.png!
>  
> The DAG corresponding to sql is as follows:
> !image-2022-08-30-21-11-50-895.png!
> The details as follows:
>  
>  
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan (42)
> +- == Final Plan ==
>LocalTableScan (1)
> +- == Initial Plan ==
>Filter (41)
>+- HashAggregate (40)
>   +- Exchange (39)
>  +- HashAggregate (38)
> +- HashAggregate (37)
>+- Exchange (36)
>   +- HashAggregate (35)
>  +- Project (34)
> +- BroadcastHashJoin Inner BuildRight (33)
>:- Project (29)
>:  +- BroadcastHashJoin Inner BuildRight (28)
>: :- Project (24)
>: :  +- BroadcastHashJoin Inner BuildRight (23)
>: : :- Project (19)
>: : :  +- BroadcastHashJoin Inner 
> BuildRight (18)
>: : : :- Project (13)
>: : : :  +- SortMergeJoin Inner (12)
>: : : : :- Sort (6)
>: : : : :  +- Exchange (5)
>: : : : : +- Project (4)
>: : : : :+- Filter (3)
>: : : : :   +- Scan 
> parquet  (2)
>: : : : +- Sort (11)
>: : : :+- Exchange (10)
>: : : :   +- Project (9)
>: : : :  +- Filter (8)
>: : : : +- Scan 
> parquet  (7)
>: : : +- BroadcastExchange (17)
>: : :+- Project (16)
>: : :   +- Filter (15)
>: : :  +- Scan parquet  (14)
>: : +- BroadcastExchange (22)
>: :+- Filter (21)
>: :   +- Scan parquet  (20)
>:  

[jira] [Commented] (SPARK-40278) Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed

2023-02-21 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691861#comment-17691861
 ] 

XiDuo You commented on SPARK-40278:
---

It should work now(3.4). We figure out SQL execution status by tracking event 
`SparkListenerSQLExecutionEnd`. see 
https://issues.apache.org/jira/browse/SPARK-40834

> Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed
> --
>
> Key: SPARK-40278
> URL: https://issues.apache.org/jira/browse/SPARK-40278
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> I used databricks spark-sql-pref + Spark 3.3 to run 3TB TPCDS q24a or q24b, 
> the test code as follows:
> {code:java}
> val rootDir = "hdfs://${clusterName}/tpcds-data/POCGenData3T"
> val databaseName = "tpcds_database"
> val scaleFactor = "3072"
> val format = "parquet"
> import com.databricks.spark.sql.perf.tpcds.TPCDSTables
> val tables = new TPCDSTables(
>       spark.sqlContext,dsdgenDir = "./tpcds-kit/tools",
>       scaleFactor = scaleFactor,
>       useDoubleForDecimal = false,useStringForDate = false)
> spark.sql(s"create database $databaseName")
> tables.createTemporaryTables(rootDir, format)
> spark.sql(s"use $databaseName")// TPCDS 24a or 24b
> val result = spark.sql(""" with ssales as
>  (select c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color,
>         i_current_price, i_manager_id, i_units, i_size, sum(ss_net_paid) 
> netpaid
>  from store_sales, store_returns, store, item, customer, customer_address
>  where ss_ticket_number = sr_ticket_number
>    and ss_item_sk = sr_item_sk
>    and ss_customer_sk = c_customer_sk
>    and ss_item_sk = i_item_sk
>    and ss_store_sk = s_store_sk
>    and c_birth_country = upper(ca_country)
>    and s_zip = ca_zip
>  and s_market_id = 8
>  group by c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color,
>           i_current_price, i_manager_id, i_units, i_size)
>  select c_last_name, c_first_name, s_store_name, sum(netpaid) paid
>  from ssales
>  where i_color = 'pale'
>  group by c_last_name, c_first_name, s_store_name
>  having sum(netpaid) > (select 0.05*avg(netpaid) from ssales)""").collect()
>  sc.stop() {code}
> The above test may failed due to `Stage cancelled because SparkContext was 
> shut down` of stage 31 and stage 36 when AQE enabled as follows:
>  
> !image-2022-08-30-21-09-48-763.png!
> !image-2022-08-30-21-10-24-862.png!
> !image-2022-08-30-21-10-57-128.png!
>  
> The DAG corresponding to sql is as follows:
> !image-2022-08-30-21-11-50-895.png!
> The details as follows:
>  
>  
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan (42)
> +- == Final Plan ==
>LocalTableScan (1)
> +- == Initial Plan ==
>Filter (41)
>+- HashAggregate (40)
>   +- Exchange (39)
>  +- HashAggregate (38)
> +- HashAggregate (37)
>+- Exchange (36)
>   +- HashAggregate (35)
>  +- Project (34)
> +- BroadcastHashJoin Inner BuildRight (33)
>:- Project (29)
>:  +- BroadcastHashJoin Inner BuildRight (28)
>: :- Project (24)
>: :  +- BroadcastHashJoin Inner BuildRight (23)
>: : :- Project (19)
>: : :  +- BroadcastHashJoin Inner 
> BuildRight (18)
>: : : :- Project (13)
>: : : :  +- SortMergeJoin Inner (12)
>: : : : :- Sort (6)
>: : : : :  +- Exchange (5)
>: : : : : +- Project (4)
>: : : : :+- Filter (3)
>: : : : :   +- Scan 
> parquet  (2)
>: : : : +- Sort (11)
>: : : :+- Exchange (10)
>: : : :   +- Project (9)
>: : : :  +- Filter (8)
>: : : : +- Scan 
> parquet  (7)
>: : : +- BroadcastExchange (17)
>: : :+- Project (16)
>: : :   +- Filter (15)
>: : :  +- Scan parquet  (14)
>: : +- BroadcastExchange (22)
>:

[jira] [Commented] (SPARK-40278) Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed

2023-02-20 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691395#comment-17691395
 ] 

Yang Jie commented on SPARK-40278:
--

SQL not failed, UI may break. [~ulysses] explained this at 
[https://github.com/apache/spark/pull/35149#issuecomment-1231712806] and 

he should have tried to fix the issue,  but I'm not sure whether it has been 
fixed

 

 

 

> Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed
> --
>
> Key: SPARK-40278
> URL: https://issues.apache.org/jira/browse/SPARK-40278
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> I used databricks spark-sql-pref + Spark 3.3 to run 3TB TPCDS q24a or q24b, 
> the test code as follows:
> {code:java}
> val rootDir = "hdfs://${clusterName}/tpcds-data/POCGenData3T"
> val databaseName = "tpcds_database"
> val scaleFactor = "3072"
> val format = "parquet"
> import com.databricks.spark.sql.perf.tpcds.TPCDSTables
> val tables = new TPCDSTables(
>       spark.sqlContext,dsdgenDir = "./tpcds-kit/tools",
>       scaleFactor = scaleFactor,
>       useDoubleForDecimal = false,useStringForDate = false)
> spark.sql(s"create database $databaseName")
> tables.createTemporaryTables(rootDir, format)
> spark.sql(s"use $databaseName")// TPCDS 24a or 24b
> val result = spark.sql(""" with ssales as
>  (select c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color,
>         i_current_price, i_manager_id, i_units, i_size, sum(ss_net_paid) 
> netpaid
>  from store_sales, store_returns, store, item, customer, customer_address
>  where ss_ticket_number = sr_ticket_number
>    and ss_item_sk = sr_item_sk
>    and ss_customer_sk = c_customer_sk
>    and ss_item_sk = i_item_sk
>    and ss_store_sk = s_store_sk
>    and c_birth_country = upper(ca_country)
>    and s_zip = ca_zip
>  and s_market_id = 8
>  group by c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color,
>           i_current_price, i_manager_id, i_units, i_size)
>  select c_last_name, c_first_name, s_store_name, sum(netpaid) paid
>  from ssales
>  where i_color = 'pale'
>  group by c_last_name, c_first_name, s_store_name
>  having sum(netpaid) > (select 0.05*avg(netpaid) from ssales)""").collect()
>  sc.stop() {code}
> The above test may failed due to `Stage cancelled because SparkContext was 
> shut down` of stage 31 and stage 36 when AQE enabled as follows:
>  
> !image-2022-08-30-21-09-48-763.png!
> !image-2022-08-30-21-10-24-862.png!
> !image-2022-08-30-21-10-57-128.png!
>  
> The DAG corresponding to sql is as follows:
> !image-2022-08-30-21-11-50-895.png!
> The details as follows:
>  
>  
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan (42)
> +- == Final Plan ==
>LocalTableScan (1)
> +- == Initial Plan ==
>Filter (41)
>+- HashAggregate (40)
>   +- Exchange (39)
>  +- HashAggregate (38)
> +- HashAggregate (37)
>+- Exchange (36)
>   +- HashAggregate (35)
>  +- Project (34)
> +- BroadcastHashJoin Inner BuildRight (33)
>:- Project (29)
>:  +- BroadcastHashJoin Inner BuildRight (28)
>: :- Project (24)
>: :  +- BroadcastHashJoin Inner BuildRight (23)
>: : :- Project (19)
>: : :  +- BroadcastHashJoin Inner 
> BuildRight (18)
>: : : :- Project (13)
>: : : :  +- SortMergeJoin Inner (12)
>: : : : :- Sort (6)
>: : : : :  +- Exchange (5)
>: : : : : +- Project (4)
>: : : : :+- Filter (3)
>: : : : :   +- Scan 
> parquet  (2)
>: : : : +- Sort (11)
>: : : :+- Exchange (10)
>: : : :   +- Project (9)
>: : : :  +- Filter (8)
>: : : : +- Scan 
> parquet  (7)
>: : : +- BroadcastExchange (17)
>: : :+- Project (16)
>: : :   +- Filter (15)
>: : :  +- Scan parquet  (14)
>: : 

[jira] [Commented] (SPARK-40278) Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed

2023-02-20 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691388#comment-17691388
 ] 

Yuming Wang commented on SPARK-40278:
-

[~LuciferYang] Is this issue still exist?

> Used databricks spark-sql-pref with Spark 3.3 to run 3TB tpcds test failed
> --
>
> Key: SPARK-40278
> URL: https://issues.apache.org/jira/browse/SPARK-40278
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> I used databricks spark-sql-pref + Spark 3.3 to run 3TB TPCDS q24a or q24b, 
> the test code as follows:
> {code:java}
> val rootDir = "hdfs://${clusterName}/tpcds-data/POCGenData3T"
> val databaseName = "tpcds_database"
> val scaleFactor = "3072"
> val format = "parquet"
> import com.databricks.spark.sql.perf.tpcds.TPCDSTables
> val tables = new TPCDSTables(
>       spark.sqlContext,dsdgenDir = "./tpcds-kit/tools",
>       scaleFactor = scaleFactor,
>       useDoubleForDecimal = false,useStringForDate = false)
> spark.sql(s"create database $databaseName")
> tables.createTemporaryTables(rootDir, format)
> spark.sql(s"use $databaseName")// TPCDS 24a or 24b
> val result = spark.sql(""" with ssales as
>  (select c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color,
>         i_current_price, i_manager_id, i_units, i_size, sum(ss_net_paid) 
> netpaid
>  from store_sales, store_returns, store, item, customer, customer_address
>  where ss_ticket_number = sr_ticket_number
>    and ss_item_sk = sr_item_sk
>    and ss_customer_sk = c_customer_sk
>    and ss_item_sk = i_item_sk
>    and ss_store_sk = s_store_sk
>    and c_birth_country = upper(ca_country)
>    and s_zip = ca_zip
>  and s_market_id = 8
>  group by c_last_name, c_first_name, s_store_name, ca_state, s_state, i_color,
>           i_current_price, i_manager_id, i_units, i_size)
>  select c_last_name, c_first_name, s_store_name, sum(netpaid) paid
>  from ssales
>  where i_color = 'pale'
>  group by c_last_name, c_first_name, s_store_name
>  having sum(netpaid) > (select 0.05*avg(netpaid) from ssales)""").collect()
>  sc.stop() {code}
> The above test may failed due to `Stage cancelled because SparkContext was 
> shut down` of stage 31 and stage 36 when AQE enabled as follows:
>  
> !image-2022-08-30-21-09-48-763.png!
> !image-2022-08-30-21-10-24-862.png!
> !image-2022-08-30-21-10-57-128.png!
>  
> The DAG corresponding to sql is as follows:
> !image-2022-08-30-21-11-50-895.png!
> The details as follows:
>  
>  
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan (42)
> +- == Final Plan ==
>LocalTableScan (1)
> +- == Initial Plan ==
>Filter (41)
>+- HashAggregate (40)
>   +- Exchange (39)
>  +- HashAggregate (38)
> +- HashAggregate (37)
>+- Exchange (36)
>   +- HashAggregate (35)
>  +- Project (34)
> +- BroadcastHashJoin Inner BuildRight (33)
>:- Project (29)
>:  +- BroadcastHashJoin Inner BuildRight (28)
>: :- Project (24)
>: :  +- BroadcastHashJoin Inner BuildRight (23)
>: : :- Project (19)
>: : :  +- BroadcastHashJoin Inner 
> BuildRight (18)
>: : : :- Project (13)
>: : : :  +- SortMergeJoin Inner (12)
>: : : : :- Sort (6)
>: : : : :  +- Exchange (5)
>: : : : : +- Project (4)
>: : : : :+- Filter (3)
>: : : : :   +- Scan 
> parquet  (2)
>: : : : +- Sort (11)
>: : : :+- Exchange (10)
>: : : :   +- Project (9)
>: : : :  +- Filter (8)
>: : : : +- Scan 
> parquet  (7)
>: : : +- BroadcastExchange (17)
>: : :+- Project (16)
>: : :   +- Filter (15)
>: : :  +- Scan parquet  (14)
>: : +- BroadcastExchange (22)
>: :+- Filter (21)
>: :   +- Scan parquet  (20)
>