Tanin Na Nakorn created SPARK-39602:
---------------------------------------

             Summary: Invoking .repartition(100000) in a unit test causes the 
unit test to take >20 minutes.
                 Key: SPARK-39602
                 URL: https://issues.apache.org/jira/browse/SPARK-39602
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.3.0
            Reporter: Tanin Na Nakorn


Here's a proof of concept: 

{code}
val result = spark
      .createDataset(List("test"))
      .rdd
      .repartition(100000)
      .map { _ =>
        "test"
      }
      .collect()
      .toList
 
    println(result)
{code}

This code takes a very long time in unit test.

We aim to test for correctness in unit test... not testing the repartition. 

Is there a way to make it faster? (e.g. disable partition in test)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to