Re: [PR] [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

via GitHub Tue, 30 Apr 2024 23:34:01 -0700


cxzl25 commented on PR #46273:
URL: https://github.com/apache/spark/pull/46273#issuecomment-2088046945


   > why we are skipping the tests
   
   Limits after group by are not guaranteed to be in order.
   ```
   [info]   == Results ==
   [info]   !== Correct Answer - 1 ==            == Spark Answer - 1 ==
   [info]    struct<id:bigint,count(1):bigint>   
struct<id:bigint,count(1):bigint>
   [info]   ![1,1]                               [0,1] (QueryTest.scala:267)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   [info]   at 
org.apache.spark.sql.QueryTest$.newAssertionFailedException(QueryTest.scala:257)
   [info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
   [info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
   [info]   at org.apache.spark.sql.QueryTest$.fail(QueryTest.scala:257)
   [info]   at org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:267)
   [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:153)
   [info]   at 
org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite.$anonfun$runAdaptiveAndVerifyResult$1(AdaptiveQueryExecSuite.scala:91)
   ```
   
   > whether the test is actually testing what we expect
   
   In this added UT check if the final execution plan of AQE contains the limit 
operator.
   ```bash
   ./bin/spark-sql --conf spark.driver.memory=6g
   ```
   ```sql
   set spark.sql.shuffle.partitions=16777217;
   create table foo as select id from range(2);
   select id, count(*) from foo group by id limit 1;
   ```
   
   ```
   == Physical Plan ==
   AdaptiveSparkPlan (7)
   +- == Final Plan ==
      LocalTableScan (1)
   +- == Initial Plan ==
      CollectLimit (6)
      +- HashAggregate (5)
         +- Exchange (4)
            +- HashAggregate (3)
               +- Scan hive spark_catalog.default.foo (2)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

Reply via email to